Integration Guide
The model can be integrated into your applications using Python (Transformers), JavaScript (Transformers.js), or directly via ONNX Runtime.
Python — Hugging Face Transformers
The simplest way to use the model in Python:
from transformers import pipeline
# Load the NER pipeline
ner = pipeline(
"token-classification",
model="bardsai/eu-pii-anonimization-multilang",
aggregation_strategy="simple"
)
# Run inference
text = "Nazywam się Jan Kowalski, mój PESEL to 85031512345."
results = ner(text)
for entity in results:
print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2%})")
Output:
PERSON_NAME: Jan Kowalski (98.5%)
PERSON_IDENTIFIER: 85031512345 (97.2%)
Batch Processing
For processing multiple documents:
texts = [
"Jan Kowalski, email: jan@example.com",
"Anna Nowak, tel: +48 600 123 456",
]
results = ner(texts)
for text_results in results:
for entity in text_results:
print(f" {entity['entity_group']}: {entity['word']}")
JavaScript — Transformers.js
Use the model in the browser or Node.js with Transformers.js:
import { pipeline } from "@huggingface/transformers";
// Load the pipeline (downloads model on first run)
const ner = await pipeline(
"token-classification",
"bardsai/eu-pii-anonimization-multilang"
);
// Run inference
const text = "Nazywam się Jan Kowalski, mój PESEL to 85031512345.";
const results = await ner(text);
console.log(results);
Web Worker (Browser)
For browser use, run inference in a Web Worker to avoid blocking the main thread. This is the approach used by this demo:
// worker.js
import { pipeline } from "@huggingface/transformers";
let ner = null;
self.onmessage = async (e) => {
if (e.data.type === "load") {
ner = await pipeline(
"token-classification",
"bardsai/eu-pii-anonimization-multilang",
{ dtype: "q8" } // Use quantized model
);
self.postMessage({ type: "loaded" });
}
if (e.data.type === "classify") {
const results = await ner(e.data.text);
self.postMessage({ type: "result", data: results });
}
};
ONNX Runtime — Direct
For maximum control, use ONNX Runtime directly:
Python
import numpy as np
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForTokenClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("bardsai/eu-pii-anonimization-multilang")
model = ORTModelForTokenClassification.from_pretrained(
"bardsai/eu-pii-anonimization-multilang"
)
# Tokenize and run
text = "Jan Kowalski, PESEL: 85031512345"
inputs = tokenizer(text, return_tensors="np")
outputs = model(**inputs)
# Process outputs
predictions = np.argmax(outputs.logits, axis=-1)
Node.js
const ort = require("onnxruntime-node");
async function runInference(modelPath, inputIds, attentionMask) {
const session = await ort.InferenceSession.create(modelPath);
const feeds = {
input_ids: new ort.Tensor("int64", inputIds, [1, inputIds.length]),
attention_mask: new ort.Tensor("int64", attentionMask, [1, attentionMask.length]),
};
const results = await session.run(feeds);
return results.logits;
}
Model Configuration
Key configuration options:
| Option | Description | Default |
|---|---|---|
aggregation_strategy | How to aggregate sub-word tokens: simple, first, average, max | simple |
dtype | Model precision: fp32, fp16, q8 (quantized) | fp32 |
device | Inference device: cpu, cuda, wasm | auto |
Quantized vs Full Precision
| Variant | Size | Speed | Accuracy |
|---|---|---|---|
| Full (FP32) | ~440 MB | Baseline | Best |
| Quantized (INT8) | ~110 MB | ~2-4x faster | Minimal loss |
For browser and edge deployments, the quantized model is recommended. The accuracy difference is negligible for most use cases.