About

bards.ai

bards.ai builds NLP models for privacy and data protection. The focus is on making GDPR-compliant text processing practical for organisations that handle personal data under EU regulations.

The model

bardsai/eu-pii-anonimization-multilang is a token-classification model fine-tuned on XLM-RoBERTa. It detects 35 entity types that correspond to personal data as defined by GDPR Art. 4(1) and the special categories listed in Art. 9(1) — including health, biometric, genetic, racial/ethnic, political, religious, trade union, and sexual orientation data.

The model achieves approximately 95% F1 score and is available on Hugging Face in standard and ONNX-quantized formats.

This demo

All inference runs locally in your browser using Transformers.js and WebAssembly. No text is sent to any server. This aligns with the data-protection-by-design principle of GDPR Art. 25 and the data minimisation principle of Art. 5(1)(c).

Built with Astro, deployed on Cloudflare Workers.

Regulatory relevance

Automated PII detection supports compliance with several EU provisions:

  • GDPR Art. 5(1)(c) — data minimisation
  • GDPR Art. 25 — data protection by design (pseudonymisation as a recommended measure)
  • GDPR Art. 32(1)(a) — security of processing (pseudonymisation and encryption)
  • GDPR Art. 89 — safeguards for research/statistical processing
  • EU AI Act Art. 10(5) — anonymisation requirement for special-category data in AI training
  • EU AI Act Art. 59 — anonymisation-first principle in regulatory sandboxes

This tool is a detection aid, not a guarantee of compliance. Always review results and consult data-protection expertise for production use.

License

Both the model and this demo are released under the Apache License 2.0. Free to use, modify, and distribute for any purpose, including commercial use.

Contact

Questions or collaboration inquiries: GitHub or bards.ai.