Integration Guide

The model can be integrated into your applications using Python (Transformers), JavaScript (Transformers.js), or directly via ONNX Runtime.

Python — Hugging Face Transformers

The simplest way to use the model in Python:

from transformers import pipeline

# Load the NER pipeline
ner = pipeline(
    "token-classification",
    model="bardsai/eu-pii-anonimization-multilang",
    aggregation_strategy="simple"
)

# Run inference
text = "Nazywam się Jan Kowalski, mój PESEL to 85031512345."
results = ner(text)

for entity in results:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")

Output:

PERSON_NAME: Jan Kowalski (98.5%)
PERSON_IDENTIFIER: 85031512345 (97.2%)

Batch Processing

For processing multiple documents:

texts = [
    "Jan Kowalski, email: jan@example.com",
    "Anna Nowak, tel: +48 600 123 456",
]

results = ner(texts)
for text_results in results:
    for entity in text_results:
        print(f"  {entity['entity_group']}: {entity['word']}")

JavaScript — Transformers.js

Use the model in the browser or Node.js with Transformers.js:

import { pipeline } from "@huggingface/transformers";

// Load the pipeline (downloads model on first run)
const ner = await pipeline(
  "token-classification",
  "bardsai/eu-pii-anonimization-multilang"
);

// Run inference
const text = "Nazywam się Jan Kowalski, mój PESEL to 85031512345.";
const results = await ner(text);

console.log(results);

Web Worker (Browser)

For browser use, run inference in a Web Worker to avoid blocking the main thread. This is the approach used by this demo:

// worker.js
import { pipeline } from "@huggingface/transformers";

let ner = null;

self.onmessage = async (e) => {
  if (e.data.type === "load") {
    ner = await pipeline(
      "token-classification",
      "bardsai/eu-pii-anonimization-multilang",
      { dtype: "q8" }  // Use quantized model
    );
    self.postMessage({ type: "loaded" });
  }

  if (e.data.type === "classify") {
    const results = await ner(e.data.text);
    self.postMessage({ type: "result", data: results });
  }
};

ONNX Runtime — Direct

For maximum control, use ONNX Runtime directly:

Python

import numpy as np
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForTokenClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("bardsai/eu-pii-anonimization-multilang")
model = ORTModelForTokenClassification.from_pretrained(
    "bardsai/eu-pii-anonimization-multilang"
)

# Tokenize and run
text = "Jan Kowalski, PESEL: 85031512345"
inputs = tokenizer(text, return_tensors="np")
outputs = model(**inputs)

# Process outputs
predictions = np.argmax(outputs.logits, axis=-1)

Node.js

const ort = require("onnxruntime-node");

async function runInference(modelPath, inputIds, attentionMask) {
  const session = await ort.InferenceSession.create(modelPath);

  const feeds = {
    input_ids: new ort.Tensor("int64", inputIds, [1, inputIds.length]),
    attention_mask: new ort.Tensor("int64", attentionMask, [1, attentionMask.length]),
  };

  const results = await session.run(feeds);
  return results.logits;
}

Model Configuration

Key configuration options:

OptionDescriptionDefault
aggregation_strategyHow to aggregate sub-word tokens: simple, first, average, maxsimple
dtypeModel precision: fp32, fp16, q8 (quantized)fp32
deviceInference device: cpu, cuda, wasmauto

Quantized vs Full Precision

VariantSizeSpeedAccuracy
Full (FP32)~440 MBBaselineBest
Quantized (INT8)~110 MB~2-4x fasterMinimal loss

For browser and edge deployments, the quantized model is recommended. The accuracy difference is negligible for most use cases.