Unfolding the universe of possibilities..

Every load time is a step closer to discovery.

Balancing Innovation With Safety & Privacy in the Era of Large Language Models (LLM)

A Guide to Implement Safety, and Privacy Mechanisms for your Generative AI applications

Photo by Jason Dent on Unsplash

The AI era has ushered in Large Language Models (aka LLMs) to the technological forefront, which has been much of the talk in 2023, and is likely to remain as such for many years to come. LLMs are the AI models that are the power house behind things like ChatGPT. These AI models, fueled by vast amounts of data and computational prowess, have unlocked remarkable capabilities, from human-like text generating to assisting with natural language understanding (NLU) tasks. They have quickly become the foundation upon which countless applications and software services are being built, or at least being augmented with.

However, as with any groundbreaking innovations, the rise of LLMs brings forth a critical question — “How do we balance this pursuit of technological advancement with the imperative of safety and privacy?”. This is not a mere philosophical question, but a challenge that requires proactive and thoughtful action.

Safety and privacy

To prioritize safety and privacy in our LLM powered applications, we’ll be honing in on key areas, including controlling the spread of personal data (personally identifiable information, a.k.a. PII) and harmful or toxic content. This is essential whether you’re fine-tuning an LLM with your own dataset or simply using an LLM for text generation tasks. Why does this matter? There are a few reasons as to why it could be important.

Compliance with government regulations that mandates protection of user personal information (such as GDPR, CCPA, HIPAA Privacy Rule etc.)Compliance with LLM provider End-User License Agreement (EULA) or Acceptable Use Policy (AUP)Comply with InfoSec policies set within organizationsMitigate possibility of bias and skew in your model; post fine tuningEnsure the ethical use of LLMs and preserve brand reputationBe prepared for any AI regulation that may be in the horizon

Considerations for fine tuning

When preparing for fine-tuning an LLM, the first step is data preparation. Outside of research, education, or personal projects, it is likely that you will run into situations where your training data may contain PII information. The first step here is to identify the existence of these PII entities in the data, and the second step is to scrub the data to ensure that these PII entities are anonymized properly.

LLM fine tuning

Considerations for text generation

For text generation using LLMs, there are a couple if things to keep in mind. First, we ensure that any prompt containing toxic content are restricted from propagating to the LLM, and second we ensure that our prompt is free of any PII entities. Consecutively, in some cases, it may be appropriate to run these validations on the text generated by the LLM, or on the “machine generated text”. This gives a dual layer of protection in ensuring our ethos of safety and privacy. A third aspect is of determining the intent of the prompt itself which may, to some extent, curtail things like prompt injection attacks. However, I will primarily focus on PII and toxicity in this article and discuss intent classification and it’s effect on LLMs in a separate discussion.

Text generation with LLM


We will take a two step approach in order to implement a solution for this. First, we make use of a name entity recognition (NER) model that can identify PII entities in the text and allows us to anonymize the entities. PII entities usually includes things like person name, location or address, phone number, credit card number, SSN and so on. Second, we use a text classification model to classify if a text is toxic or neutral. Examples of toxic text typically are text that contain abuse, obscenities, harassment, bullying and so forth.

For the PII NER model, a most common choice would be a BERT Base model that can be fine tuned to detect specific PII entities. You can also fine tune pre-trained transformer models such as the Robust DeID (de-identification) pre-trained model which is a RoBERTa model fine-tuned for de-identification of medical notes and mostly focuses on personal health information (a.k.a PHI). A much simpler option to begin experimenting would be using spaCy ER (EntityRecognizer).

import spacy

nlp = spacy.load(“en_core_web_lg”)
text = “Applicant’s name is John Doe and he lives in Silver St.
and his phone number is 555-123-1290″
doc = nlp(text)

displacy.render(doc, style=”ent”, jupyter=True)

which gives us

Annotation of PII entities detected by spaCy

spaCy EntityRecognizer was able to identify three entities — PERSON (People, including fictional characters), FAC (Location or address), and CARDINAL (Numerals that do not fall under another type). spaCy also gives us the start and end offset (character position in the text) of the detected entity which we can use to perform anonymization.

ent_positions = [(ent.start_char, ent.end_char) for ent in doc.ents]

for start, end in reversed(ent_positions):
text = text[:start] + ‘#’ * (end – start) + text[end:]


which gives us

Applicant’s name is ######## and his he lives in ###################and his phone number is ###-123-1290

But there are a few obvious issues here. spaCy ER’s default entity list is not exhaustive to cover for all types of PII entities. For example, in our case we would like to detect 555-123-1290 as a PHONE_NUMBER as opposed to just part of the text as CARDINAL leading to incomplete entity detection. Of-course, just like the transformer based NER models, spaCy can also be trained with your own dataset of custom name entities in order to make it more robust. However, we will use open-source Presidio SDK which is a more purpose-built toolkit for data protection and de-identification.

PII detection and anonymization with Presidio

The Presidio SDK provides a full set of PII detection capabilities, with a long list of supported PII entities. Presidio primarily uses pattern matching, along with ML capabilities of spaCy and Stanza. However, Presidio is customizable and can be plugged-in to use your transformer based PII entity recognition model, or with even cloud based PII capabilities such as Azure Text Analytics PII detection, or Amazon Comprehend PII detection. It also comes with a built-in customizable anonymizer that can help scrub and redact PII entities from text.

from presidio_analyzer import AnalyzerEngine

Applicant’s name is John Doe and his he lives in Silver St.
and his phone number is 555-123-1290.

analyzer = AnalyzerEngine()
results = analyzer.analyze(text=text,
for result in results:
print(f”PII Type={result.entity_type},”,
f”Start offset={result.start},”,
f”End offset={result.end},”,

which gives us

PII Type=PERSON, Start=21, End=29, Score=0.85
PII Type=LOCATION, Start=50, End=60, Score=0.85
PII Type=PHONE_NUMBER, Start=85, End=97, Score=0.75


Annotation of PII entities detected by Presidio

As we’ve seen before, it’s a rather trivial task to anonymize the text since we have the beginning and end offsets of each of the entities within the text. However, we are going to make use of Presidio’s built-in AnonymizerEngine to help us with this.

from presidio_anonymizer import AnonymizerEngine

anonymizer = AnonymizerEngine()
anonymized_text = anonymizer.anonymize(text=text,analyzer_results=results)

which gives us

Applicant’s name is <PERSON> and his he lives in <LOCATION>
and his phone number is <PHONE_NUMBER>.

This so far is great, but what if we want the anonymization to be just plain masking. In that case we can pass in custom configuration to the AnonymizerEngine which can perform simple masking of the PII entities. For example, we mask the entities with the asterisk (*) characters only.

from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig

operators = dict()

# assuming `results` is the output of PII entity detection by `AnalyzerEngine`
for result in results:
operators[result.entity_type] = OperatorConfig(“mask”,
{“chars_to_mask”: result.end – result.start,
“masking_char”: “*”,
“from_end”: False})

anonymizer = AnonymizerEngine()
anonymized_results = anonymizer.anonymize(
text=text, analyzer_results=results, operators=operators


gives us

Applicant’s name is ******** and he lives in ********** and his phone number is ************.

Considerations for anonymization

There are a few things to keep in mind when you decide to anonymize PII entities in the text.

Presidio’s default AnonymizerEngine uses a pattern <ENTITY_LABEL> to mask the PII entities (like <PHONE_NUMBER> ). This can potentially cause issues especially with LLM fine-tuning. Replacing PII with entity type labels can introduce words that carry semantic meaning, potentially affecting the behavior of language models.Pseudonymization is a useful tool for data protection, however you should exercise caution performing pseudonymization on your training data. For example, replacing all NAME entities with the pseudonym John Doe , or replacing all DATE entities with 01-JAN-2000 in your fine-tuning data may lead to extreme bias in your fine-tuned model.Be aware of how your LLM reacts to certain characters or patterns in your prompt. Some LLMs may need a very specific way of templating prompts to get the most out of the model, for example Anthropic recommends using prompt tags. Being aware of this will help decide how you may want to perform anonymization.

There could be other general side effects of anonymized data on model fine-tuning such as loss of context, semantic drift, model hallucinations and so on. It is important to iterate and experiment to see what level of anonymization is appropriate for your needs, while minimizing it’s negative effects on the model’s performance.

Toxicity detection with text classification

In order to identify whether a text contains toxic content or not, we will use a binary classification approach — 0 if the text is neutral, 1 if the text is toxic. I decided to train a DistilBERT base model (uncased) which is a distilled version of a BERT base model. For training data, I used the Jigsaw dataset.

I won’t go into the details of how the model was trained and model metrics etc. however you can refer to this article on training a DistilBERT base model for text-classification tasks. You can see the model training script I wrote here. The model is available in HuggingFace Hub as tensor-trek/distilbert-toxicity-classifier. Let’s run a few sample pieces of text through inference to check what the model tells us.

from transformers import pipeline

text = [“This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three.”,
“I wish i could kill that bird, I hate it.”]

classifier = pipeline(“text-classification”, model=”tensor-trek/distilbert-toxicity-classifier”)

which gives us —

{‘label’: ‘NEUTRAL’, ‘score’: 0.9995143413543701},
{‘label’: ‘TOXIC’, ‘score’: 0.9622979164123535}

The model is correctly classifying the text as NEUTRAL or TOXIC with pretty high confidence. This text classification model, in conjunction to our previously discussed PII entity classification can now be used to create a mechanism that can enforce privacy and safety within our LLM powered applications or services.

Putting things together

We’ve tackled privacy through a PII entity recognition mechanism, and we tackled the safety part with a text toxicity classifier. You can think of other mechanisms that may be relevant to your organization’s definition of safety & privacy. For example, healthcare organizations may be more concerned about PHI instead of PII and so forth. Ultimately, the overall implementation approach for this remains the same no matter what controls you want to introduce.

With that in mind, it is now time to put everything together into action. We want to be able use both the privacy and safety mechanisms in conjunction with an LLM for an application where we want to introduce generative AI capabilities. I am going to use the popular LangChain framework’s Python flavor (also available in JavaScript/TS) to build a generative AI application which will include the two mechanisms. Here’s how our overall architecture looks like.

Privacy and safety flow with LangChain

In the above architecture, the first thing I check is if the text contains toxic content with at least more than 80% of model accuracy. If so, the execution of the whole LangChain application stops at that point, and the user is shown an appropriate message. If the text is classified largely as neutral, then I pass it onto the next step of identifying PII entities. I then perform anonymization on those entities in the text if the confidence score of each of these entity detection is more than 50%. Once the text is fully anonymized, it is passed as a prompt to the LLM for further text generation by the model. Note that the accuracy thresholds (80% and 50%) are arbitrary, you would want to test the accuracy of both the detectors (PII & toxicity) on your data and decide on a threshold that works best for your use-case. The lower the threshold, the stricter the system becomes, the higher the threshold the weaker the enforcement of these checks.

Another more conservative approach would be to stop the execution if any PII entities are detected. This could be useful for applications that are not certified to handle PII data at all and you want to ensure, no matter how, text containing PII doesn’t end up being fed into the application as an input.

Privacy and safety flow with LangChain — alternate flow

LangChain implementation

In order to make it work with LangChain, I created a custom chain called PrivacyAndSafetyChain . This can be chained with any LangChain supported LLMs to implement a privacy and safety mechanism. This is how it looks like —

from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
from PrivacyAndSafety import PrivacyAndSafetyChain

safety_privacy = PrivacyAndSafetyChain(verbose=True,
pii_labels = [“PHONE_NUMBER”, “US_SSN”])

template = “””{question}”””

prompt = PromptTemplate(template=template, input_variables=[“question”])
llm = HuggingFaceHub(
repo_id=repo_id, model_kwargs={“temperature”: 0.5, “max_length”: 256}

chain = (
| safety_privacy
| {“input”: (lambda x: x[‘output’] ) | llm}
| safety_privacy

response = chain.invoke({“question”: “””What is John Doe’s address, phone number and SSN from the following text?

John Doe, a resident of 1234 Elm Street in Springfield, recently celebrated his birthday on January 1st. Turning 43 this year, John reflected on the years gone by. He often shares memories of his younger days with his close friends through calls on his phone, (555) 123-4567. Meanwhile, during a casual evening, he received an email at jo*****@ex*****.com reminding him of an old acquaintance’s reunion. As he navigated through some old documents, he stumbled upon a paper that listed his SSN as 338-12-6789, reminding him to store it in a safer place.
except Exception as e:

By default, PrivacyAndSafetyChain performs toxicity detection first. If it detects any toxic content then it will error out essentially stopping the chain as we discussed earlier. If not, then it passess the entered text to the PII entity recognizer and based on what masking character to use, the chain will perform anonymization of the text with the detected PII entities. The output of the preceding code is as shown below. Since there is no toxic content the chain didn’t stop, and it detected PHONE_NUMBER and SSN and correctly anonymized it.

> Entering new PrivacyAndSafetyChain chain…
Running PrivacyAndSafetyChain…
Checking for Toxic content…
Checking for PII…

> Finished chain.

> Entering new PrivacyAndSafetyChain chain…
Running PrivacyAndSafetyChain…
Checking for Toxic content…
Checking for PII…

> Finished chain.
1234 Elm Street, **************, ***********


The biggest takeaway in this post is that as we continue to innovate with Large Language Models, it becomes imperative to balance the scales of innovation with safety and privacy. The enthusiasm surrounding LLMs, and our ever-growing desire to integrate them with a whole world of possible use-cases is un-deniable. However, the potential pitfalls — like data privacy breaches, unintended biases, or misuse — are equally real and warrant our immediate attention. I covered how you can establish a mechanism of detecting PII and toxic content going into your LLMs and discussed an implementation with LangChain.

There’s still much research and development that remains to be done— perhaps a better architecture, more reliable and seamless way to ensure data privacy and safety. The code in this post is trimmed down for brevity, but I encourage you to checkout my GitHub repository where I have collated detailed Notebooks on each step along with full source code of the custom LangChain that we discussed. Use it, fork it, improve it, go forth and innovate!


[1] Jacob Devlin, Ming-Wei Chang et. al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[2] Victor Sanh, Lysandre Debut et. al DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

[3] Dataset — Jigsaw Multilingual Toxic Comment Classification 2020

Unless otherwise noted, all images are by the author

Balancing Innovation With Safety & Privacy in the Era of Large Language Models (LLM) was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment