Open-source · Apache 2.0 · 35 entity types · runs locally

Make your AI data processing EU‑compliant

Detect and anonymize personal data before it enters your AI pipeline. Covers all PII defined by GDPR Art. 4(1), special categories under Art. 9(1), and meets the anonymization requirements of EU AI Act Art. 10(5). Built on bardsai/eu-pii-anonimization-multilang, an open-source XLM-RoBERTa token classifier running entirely in your browser via ONNX/WebAssembly — no data leaves your device.

Supported EU languages: ๐Ÿ‡ฌ๐Ÿ‡ง English ๐Ÿ‡ต๐Ÿ‡ฑ Polish ๐Ÿ‡ฉ๐Ÿ‡ช German ๐Ÿ‡ซ๐Ÿ‡ท French ๐Ÿ‡ช๐Ÿ‡ธ Spanish ๐Ÿ‡ฎ๐Ÿ‡น Italian ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch ๐Ÿ‡ต๐Ÿ‡น Portuguese ๐Ÿ‡ท๐Ÿ‡ด Romanian ๐Ÿ‡จ๐Ÿ‡ฟ Czech ๐Ÿ‡ธ๐Ÿ‡ช Swedish ๐Ÿ‡ฌ๐Ÿ‡ท Greek ๐Ÿ‡ญ๐Ÿ‡บ Hungarian ๐Ÿ‡ง๐Ÿ‡ฌ Bulgarian ๐Ÿ‡ญ๐Ÿ‡ท Croatian ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish ๐Ÿ‡ช๐Ÿ‡ช Estonian ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish ๐Ÿ‡ฎ๐Ÿ‡ช Irish ๐Ÿ‡ฑ๐Ÿ‡ป Latvian ๐Ÿ‡ฑ๐Ÿ‡น Lithuanian ๐Ÿ‡ฒ๐Ÿ‡น Maltese ๐Ÿ‡ธ๐Ÿ‡ฐ Slovak ๐Ÿ‡ธ๐Ÿ‡ฎ Slovenian
Click to download the model from HuggingFace

Results preview

Employee Anna WisniewskaPERSON_NAME, who works in the ITPERSON_ROLE_OR_TITLE department, receives a gross salary of 12,500 PLNFINANCIAL_AMOUNT. Passport number: EP1234567ORGANIZATION_IDENTIFIER. Company tax ID: 527-020-1234ORGANIZATION_IDENTIFIER. Home address: 22 Piekna Street, 00-549POSTAL_ADDRESS WarsawLOCATION. Religion: CatholicismRELIGION_OR_BELIEF. Member of the NSZZ Solidarnoล›ฤ‡TRADE_UNION_MEMBERSHIP union.

Detected Entities

EntityTypeRegulationConfidence
Anna WisniewskaPERSON_NAMEArt. 4(1)100%
ITPERSON_ROLE_OR_TITLEArt. 4(1)85%
12,500 PLNFINANCIAL_AMOUNTArt. 4(1)91%
EP1234567ORGANIZATION_IDENTIFIERArt. 4(1)64%
527-020-1234ORGANIZATION_IDENTIFIERArt. 4(1)100%
22 Piekna Street, 00-549POSTAL_ADDRESSArt. 4(1)100%
WarsawLOCATIONArt. 4(1)89%
CatholicismRELIGION_OR_BELIEFArt. 9(1)100%
NSZZ Solidarnoล›ฤ‡TRADE_UNION_MEMBERSHIPArt. 9(1)53%

Entity taxonomy

35 entity types across 8 categories, mapped to specific GDPR articles. Art. 9(1) special categories (health, biometric, genetic, political, religious, ethnic, sexual orientation, trade union) are flagged separately to support higher-risk processing rules.

Personal Identity

GDPR Art. 4(1)
PERSON_NAMEDATE_OF_BIRTHPERSON_ATTRIBUTEPERSON_ALIASPERSON_IDENTIFIER

Organizations

GDPR Art. 4(1)
ORGANIZATION_NAMEORGANIZATION_IDENTIFIER

Contact & Location

GDPR Art. 4(1)
EMAIL_ADDRESSPHONE_NUMBERCONTACT_HANDLEPOSTAL_ADDRESSLOCATIONGEO_LOCATION

Technical Identifiers

GDPR Recital 30
IP_ADDRESSDEVICE_IDENTIFIERCOOKIE_IDENTIFIERACCOUNT_IDENTIFIERAUTH_SECRET

Financial

GDPR Art. 4(1)
BANK_ACCOUNT_IDENTIFIERPAYMENT_CARDPAYMENT_CARD_SECURITYDOCUMENT_REFERENCEFINANCIAL_AMOUNTINCOME_COMPENSATIONVEHICLE_IDENTIFIER

Health & Biometric

GDPR Art. 9(1)
HEALTH_DATAGENETIC_DATABIOMETRIC_DATA

Special Categories

GDPR Art. 9(1)
RELIGION_OR_BELIEFPOLITICAL_OPINIONSEXUAL_ORIENTATIONTRADE_UNION_MEMBERSHIPETHNIC_ORIGINCRIMINAL_OFFENCE_DATA

Employment

GDPR Art. 88
PERSON_ROLE_OR_TITLE

Regulatory context

This model addresses requirements from two EU regulations. Below are the specific provisions that PII detection and anonymisation tools help satisfy.

GDPR (Regulation 2016/679)

Art. 4(1) — Personal data
Defines the scope: any information relating to an identified or identifiable natural person. The 35 entity types in this model map directly to the identifiers listed here.
Art. 5(1)(c) — Data minimisation
Personal data must be adequate, relevant, and limited to what is necessary. Automated PII detection is a technical measure to enforce minimisation at scale.
Art. 9(1) — Special categories
Processing of health, biometric, genetic, racial/ethnic, political, religious, trade union, and sexual orientation data is prohibited by default. This model detects all eight special categories.
Art. 25 — Data protection by design
Controllers must implement appropriate technical measures, such as pseudonymisation, to implement data-protection principles. PII detection is a prerequisite for pseudonymisation.
Art. 32(1)(a) — Security of processing
Lists pseudonymisation and encryption as appropriate security measures. Automated PII detection enables systematic pseudonymisation of personal data.

EU AI Act (Regulation 2024/1689)

Art. 10(5) — Data governance for high-risk AI
Special-category personal data may only be used for bias detection when anonymised or synthetic data cannot fulfill the purpose. Requires “state-of-the-art pseudonymisation” when processing is necessary.
Art. 59 — AI regulatory sandboxes
Personal data in sandboxes may only be processed when anonymised or synthetic data is insufficient. Establishes an anonymisation-first principle for AI development.
Recital 69 — Privacy throughout AI lifecycle
Data minimisation and protection by design must be ensured throughout the entire AI lifecycle. Lists anonymisation and encryption as compliance measures.
Annex III, §1 — High-risk: biometrics
Remote biometric identification and categorisation systems based on sensitive attributes are classified as high-risk, requiring additional compliance obligations.

LLM inference without PII exposure

Send data to any third-party LLM safely. PII is replaced with indexed tokens before leaving your infrastructure, and restored in the final output.

Your infrastructure
Third-party LLM
1
Original prompt
Extract the invoice details: issued to John Kowalski, tax ID 527-020-1234, address 10/5 Marszalkowska Street, 00-001 Warsaw, account PL61 1090 1014 0000 0712 1981 2874, amount 12,500 PLN
โ†“ Anonymize
2
Anonymized prompt
Extract the invoice details: issued to [PERSON_NAME_1], tax ID [ORGANIZATION_IDENTIFIER_1], address [POSTAL_ADDRESS_1], account [BANK_ACCOUNT_IDENTIFIER_1], amount [FINANCIAL_AMOUNT_1]
โ†’
โ†’ Send to third-party LLM
โ†
3
LLM response external
{
  "vendor": "[PERSON_NAME_1]",
  "tax_id": "[ORGANIZATION_IDENTIFIER_1]",
  "address": "[POSTAL_ADDRESS_1]",
  "iban": "[BANK_ACCOUNT_IDENTIFIER_1]",
  "amount": "[FINANCIAL_AMOUNT_1]"
}
โ† Response returns to your infrastructure
โ†“ De-anonymize
4
Final result
{
  "vendor": "John Kowalski",
  "tax_id": "527-020-1234",
  "address": "10/5 Marszalkowska Street, ...",
  "iban": "PL61 1090 ... 2874",
  "amount": "12,500 PLN"
}
Entity mapping (stored locally, never sent)
[PERSON_NAME_1]โ†”John Kowalski [ORGANIZATION_IDENTIFIER_1]โ†”527-020-1234 [POSTAL_ADDRESS_1]โ†”10/5 Marszalkowska Street, ... [BANK_ACCOUNT_IDENTIFIER_1]โ†”PL61 1090 ... 2874 [FINANCIAL_AMOUNT_1]โ†”12,500 PLN

GDPR-safe model training data

Prepare datasets that comply with EU AI Act Art. 10(5). Real PII is replaced with synthetic data โ€” the text structure stays intact for learning, but contains zero real personal information.

1
Original training record
Patient Emil Nowak (national ID: 91082734567), living at 22 Lipowa Street, 30-702 Krakow, presented with chest pain. Attending physician: Dr. Anna Wisniewska.
โ†“ Detect & replace with synthetic data
2
Anonymized training record
Patient Thomas Zielinski (national ID: 85120498321), living at 7 Debowa Street, 50-307 Wroclaw, presented with headache. Attending physician: Dr. Katarzyna Lewandowska.
โ†“ Safe for training
3
Model training
Text structure preserved. No real person identifiable. Compliant with GDPR Art. 89 and EU AI Act Art. 10(5).
Each record gets unique synthetic replacements
Emil Nowakโ†’Thomas Zielinski 91082734567โ†’85120498321 22 Lipowa Street, ...โ†’7 Debowa Street, ... chest painโ†’headache Dr. Anna Wisniewskaโ†’Dr. K. Lewandowska
Standard PII — Art. 4(1) Special category — Art. 9(1)