Simple Anonymization for AI Tools

When you’re building tools that send data to AI APIs, you have a privacy problem. Messages, documents, and logs often contain names—people, companies, clients. Sending those directly to an external API creates risk.

I needed a simple solution for my Slack inbox triage tool. Here’s what I built.

Why Anonymize?

Three reasons:

Concern	Risk
Privacy	Personal names in AI training data or logs
Compliance	Some jurisdictions require data minimization
Liability	Less exposure if something goes wrong

There’s also a practical benefit: AI focuses on content, not personalities. “Person-A said X” gets evaluated more neutrally than “John said X” when the AI has associations with common names.

The Approach

My requirements were simple:

Replace person names with “Person”
Replace company names with “Company”
Fast enough to run on every message
No external API calls (that would defeat the purpose)

I settled on Compromise, a lightweight JavaScript NLP library. It’s fast, runs locally, and handles named entity recognition well enough for this use case.

The Code

import nlp from 'compromise';

export function anonymizeNames(text: string): string {
  const doc = nlp(text);

  const peopleNames: string[] = doc.people().out('array') as string[];
  const orgNames: string[] = doc.organizations().out('array') as string[];

  // Regex fallback for legal entities NLP misses
  const legalEntityPattern = /\b[\w\s]+(?:Pty\.?\s*Ltd\.?|LLC|Inc\.?|Corp\.?|Limited|L\.?L\.?C\.?|P\.?L\.?C\.?)\b/gi;
  const legalMatches = text.match(legalEntityPattern) || [];
  legalMatches.forEach(match => {
    const trimmed = match.trim();
    if (!orgNames.includes(trimmed)) {
      orgNames.push(trimmed);
    }
  });

  if (peopleNames.length === 0 && orgNames.length === 0) {
    return text;
  }

  let result = text;

  // Replace orgs first (longer matches first to avoid partial replacements)
  const sortedOrgs = [...new Set(orgNames)].sort((a, b) => b.length - a.length);
  sortedOrgs.forEach(original => {
    const regex = new RegExp(`\\b${original.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}\\b`, 'gi');
    result = result.replace(regex, 'Company');
  });

  // Then replace people
  const uniquePeople = [...new Set(peopleNames)];
  uniquePeople.forEach(original => {
    const regex = new RegExp(`\\b${original.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}\\b`, 'gi');
    result = result.replace(regex, 'Person');
  });

  return result;
}

Why the Regex Fallback?

Compromise handles “John Smith” and “Acme Corporation” well. But it misses legal entity patterns like:

“Smith & Associates Pty Ltd”
“TechCorp LLC”
“Innovation Inc.”

The regex catches these with a pattern that looks for common legal suffixes: Pty Ltd, LLC, Inc, Corp, Limited.

The order matters too. Replace organizations first because they’re typically longer. If you replace “John” before “John’s Company Pty Ltd”, you end up with “Person’s Company Pty Ltd” instead of “Company”.

Where to Apply It

Two places make sense:

Before AI calls — The primary use case. Strip names before sending to Claude, GPT, or any external API.

const prompt = anonymizeNames(userMessage);
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  messages: [{ role: 'user', content: prompt }]
});

When storing logs/archives — If you’re keeping conversation history for debugging or training, anonymize before storage. This way even your local archives don’t have unnecessary PII.

const archiveEntry = {
  timestamp: new Date().toISOString(),
  content: anonymizeNames(originalMessage),
  action: 'replied'
};

Trade-offs

It’s not perfect. NLP-based entity recognition will miss unusual names and occasionally flag common words as names. For most use cases, this is fine—you’re reducing exposure, not eliminating it.

No reversibility. Once names are replaced with “Person” and “Company”, you can’t get them back. If you need that, you’d need a mapping table, which adds complexity.

English-focused. Compromise works best with English text. Other languages would need different tools.

When Not to Use This

If you’re building a system where names matter—like a CRM integration or contact management—this approach breaks functionality. Anonymization is for contexts where names are incidental to the AI’s task.

My Slack triage tool is a good example. The AI classifies messages by urgency. It doesn’t need to know who John is; it just needs to understand “Person-A has a blocker and needs help.”

Results

In practice, this adds maybe 50ms per message. Unnoticeable. And it means I can send Slack messages to Claude without worrying about who’s mentioned in them.

The approach is deliberately simple. No machine learning models, no external services, no complex configuration. Just NLP + regex, running locally, fast enough to apply everywhere.