Classifier Intelligence Engine

The AI Layer That Makes Sense Unbeatable

18 specialized models. 5 detection layers. Every model runs exclusively on Segmento's private infrastructure — your data never reaches a third-party AI provider.

Explore All Models Try Sense Free

4A · Primary NER

4B · GLiNER

4C · Spatial Analysis

4D · Lang Detection

Framework

By The Numbers

An ensemble built for precision

Total Models in Stack

Currently Deployed

Fully Trainable

0.9963

Peak F1 Score

Model Stack

18 models. 5 detection layers.

Each sub-layer targets a specific challenge. Together, they form an ensemble no single model can match.

4A — Primary NER: ModernBERT & DeBERTa models for high-accuracy Named Entity Recognition

4AIn UseTrainable

joneauxedgar/pasteproof-pii-detector-v2

ModernBERT-base (149M)

ModernBERT-base with 149M parameters. Covers 27 PII types across PCI/HIPAA/GDPR frameworks. Trained on 150K synthetic examples with BIO tagging.

F1 0.970

F1 (held-out)

~120ms GPU

4AIn UseTrainable

llm-semantic-router/mmbert32k-pii-detector-merged

ModernBERT 307M + YaRN (32K context)

307M parameter ModernBERT with YaRN extending context to 32K tokens. Unmatched open-source coverage for long-form legal and compliance documents.

F1 0.969

F1 (reported)

~400ms GPU

4AComing SoonTrainable

OpenMed PII Family (FR/DE/IT variants)

ModernBERT-large (395M)

ModernBERT-large (395M) covering 55+ EU-localized PII types including French NSS and Italian Codice Fiscale. Multilingual healthcare focus.

>F1 0.960

F1 (reported)

~180ms GPU

4AIn UseTrainable

iiiorg/piiranha-v1-detect-personal-information

DeBERTa-v3-base

DeBERTa-v3-base with 99.44% binary accuracy. The fastest deployed model at 25ms GPU. Ideal as a high-speed first-pass gate across 6 languages.

99.44% acc · F1 0.931

Binary Acc / F1-macro

~25ms GPU

4AComing SoonTrainable

exdsgift/NerGuard-0.3B

mDeBERTa-v3-base (0.3B)

mDeBERTa-v3-base with the highest throughput in the 4A sub-layer. 33ms median latency across 8 EU languages. Ideal for real-time pipelines.

F1-macro 0.9963

F1-macro (in-dist)

33ms GPU

4AIn UseTrainable

lakshyakh93/deberta_finetuned_pii

DeBERTa-v3-base

DeBERTa-v3-base fine-tuned on ai4privacy/pii-masking-300k. Serves as a warm fallback and general-purpose baseline in the 4A ensemble.

F1 ~0.920

F1 (est.)

~30ms GPU

Green border = Actively deployed

Amber border = Coming soon

Technical Comparison

Every model. Every metric.

Filter and sort across the full stack. Click column headers to re-rank by accuracy or speed.

Showing 18 models

Model	Layer	Architecture	Top Metric	Context Window	Best For	Trainable	Latency
joneauxedgar/pasteproof-pii-detector-v2 In Use	4A	ModernBERT-base (149M)	F1 0.970 F1 (held-out)	8,192 tokens	Long compliance docs, leakage prevention, intentional variation coverage	Yes	~120ms GPU
llm-semantic-router/mmbert32k-pii-detector-merged In Use	4A	ModernBERT 307M + YaRN (32K context)	F1 0.969 F1 (reported)	32,768 tokens	Extreme-length documents, legal contracts, batch reports with dense PII	Yes	~400ms GPU
OpenMed PII Family (FR/DE/IT variants)	4A	ModernBERT-large (395M)	>F1 0.960 F1 (reported)	8,192 tokens	EU multilingual (FR/DE/IT), GDPR localized entity formats, healthcare records	Yes	~180ms GPU
iiiorg/piiranha-v1-detect-personal-information In Use	4A	DeBERTa-v3-base	99.44% acc · F1 0.931 Binary Acc / F1-macro	512 tokens (sub-256 optimal)	High-speed short-segment screening, multilingual real-time API, binary PII gate	Yes	~25ms GPU
exdsgift/NerGuard-0.3B	4A	mDeBERTa-v3-base (0.3B)	F1-macro 0.9963 F1-macro (in-dist)	512 tokens	Ultra-low latency EU multilingual, high-throughput real-time pipelines	Yes	33ms GPU
lakshyakh93/deberta_finetuned_pii In Use	4A	DeBERTa-v3-base	F1 ~0.920 F1 (est.)	512 tokens	General-purpose PII baseline, interpretable benchmark	Yes	~30ms GPU
knowledgator/gliner-pii-large-v1.0	4B	GLiNER-large (bi-encoder)	F1 0.833 · Prec 0.874 F1 / Precision	512 tokens	Minimizing false positives, broadest entity coverage, production compliance audits	Yes	~45ms GPU
nvidia/gliner-PII-0.1 In Use	4B	GLiNER DeBERTa (570M)	Strict F1 0.870 Strict F1	512 tokens	Enterprise compliance, healthcare / finance / legal tri-domain	Yes	~60ms GPU
gretelai/gretel-gliner-bi-large-v1.0	4B	GLiNER-large (bidirectional)	F1 0.950 F1 (internal bench)	512 tokens	Dual PII+PHI detection in one pass, HIPAA + GDPR simultaneously	Yes	~50ms GPU
OvermindLab/nerpa	4B	GLiNER2 (unified NER + structured extraction)	Micro-Prec 0.930 Micro-Precision	512 tokens	Disambiguation of overlapping entity types, beats AWS Comprehend	Yes	~55ms GPU
urchade/gliner_small-v2.1 In Use	4B	GLiNER-small (DeBERTa-v3-small encoder)	F1 ~0.850 F1 (general NER)	512 tokens	Zero-shot custom entities, prototyping new PII types, ultra-fast inference	Yes	~15ms GPU
Surya OCR (Datalab)	4C	Detection + segmentation models	Prec 0.99 · Rec 0.96 Table Det. Prec / Recall	Full page canvas	Scanned document pre-processing, reading order correction, bounding box extraction	Partial	~620ms/page GPU
nielsr/layoutlmv3-finetuned-cord	4C	LayoutLMv3-base (multimodal)	F1 0.9638 F1 (CORD)	512 tokens + image patches	Receipt & invoice spatial PII extraction, structured financial document parsing	Yes	~200ms GPU (incl. OCR)
nielsr/layoutlmv3-finetuned-funsd	4C	LayoutLMv3-base (multimodal)	F1 0.9078 F1 (FUNSD)	512 tokens + image patches	Scanned form key-value extraction, insurance/government forms	Yes	~200ms GPU (incl. OCR)
parthesh111/layoutlmv3-finetune-bioes-new	4C	LayoutLMv3-base + PaddleOCR	F1 ~0.920 F1 (medical lab reports)	512 tokens + image patches	Scanned medical lab report de-identification, HIPAA PHI spatial extraction	Yes	~250ms GPU (incl. OCR)
fast-langdetect (FastText lite)	4D	FastText (bag-of-n-grams classifier)	~98% (common langs) Top-1 Accuracy (common langs)	Sentence/paragraph	Edge-level language routing, <1ms CPU, GPU-free classification	Limited	<1ms CPU
cis-lmu/glotlid (V3)	4D	FastText-based (character n-grams)	2,102 language labels Coverage (labels)	Sentence/paragraph	Low-resource & obscure dialect routing, preventing metadata leakage	Limited	<2ms CPU
Microsoft Presidio In Use	Framework	Rule-based + spaCy NER + custom recognizers	99%+ structured · ~80% names Accuracy (structured entities)	Unlimited (chunked internally)	Orchestration layer, rule-based PII (regex), plugging in any model above	Customizable	~10–50ms CPU

Why Segmento Sense

Built different. By design.

These aren't marketing checkboxes. They are deliberate architectural decisions that took 18 months to get right.

You Always Know Why — Not Just What

Most PII tools hand you a list of findings and leave you guessing. Sense shows you which model flagged each entity, which rule triggered it, and the exact text span that caused it. Transparency isn't a feature — it's the foundation.

You Control Precision vs. Recall

A single model is a single threshold. Our Consensus Engine lets you dial a confidence slider: low confidence flags aggressively (maximizes recall for regulated environments), high confidence flags conservatively (minimizes false positives for high-throughput pipelines). You choose the tradeoff.

Works Completely Offline — No Cloud Required

Banks, hospitals, and defense contractors cannot use SaaS tools. Sense runs all 18 models on your own private infrastructure with zero external API calls. Air-gap deployments are fully supported. Your data never crosses a network boundary it shouldn't.

Replace Real PII With Valid Synthetic Data

Redacting PII for test environments usually breaks data integrity. Sense replaces real PII with structurally valid synthetic data — SSNs that pass Luhn checks, IBANs with correct checksums, email addresses that match real domains. Your test data stays usable.

Generate SQL & Python Fix Scripts Automatically

Sense bridges the gap between the Security team that finds PII and the Data Engineering team that fixes it. One click generates a ready-to-run remediation script for the exact columns and tables flagged. No more "we found 3,000 PII fields" without a path forward.

The Industry Reality

Every major vendor sends your data to the cloud.
We built Sense to be the exception.

Whether it's a cloud platform charging per GB, an enterprise DLP tool requiring weeks of setup, or an AI-first SaaS that runs your sensitive documents through external model APIs — the industry's default is to move your data.

Segmento Sense runs all 18 models on your private infrastructure.Zero third-party AI access. Zero data egress. Zero compliance risk from vendor-side breaches. Your PII stays where it belongs — with you.

Cloud Platforms

AWS Macie, Google Cloud DLP, Microsoft Purview

Data leaves your perimeter

Enterprise DLP

Symantec, Trellix, Spirion, Digital Guardian

Complex, expensive, opaque

AI-First SaaS Tools

Nightfall, Private AI, BigID, Varonis

Training on your sensitive data

vs Segmento Sense

Segmento Sense

18 self-hosted models · Full offline capability · Zero third-party AI

Your data stays with you

See It In Action

Ready to see the AI engine work for your data?

Upload a document. Watch all 18 models work in concert. See exactly which model flagged what entity, and why — in real time.

Try the Live Demo Contact Sales

The AI Layer That Makes Sense Unbeatable

An ensemble built for precision

18 models. 5 detection layers.

joneauxedgar/pasteproof-pii-detector-v2

llm-semantic-router/mmbert32k-pii-detector-merged

OpenMed PII Family (FR/DE/IT variants)

iiiorg/piiranha-v1-detect-personal-information

exdsgift/NerGuard-0.3B

lakshyakh93/deberta_finetuned_pii

Every model. Every metric.

Built different. By design.

You Always Know Why — Not Just What

You Control Precision vs. Recall

Works Completely Offline — No Cloud Required

Replace Real PII With Valid Synthetic Data

Generate SQL & Python Fix Scripts Automatically

Every major vendor sends your data to the cloud.We built Sense to be the exception.

Ready to see the AI engine work for your data?

Every major vendor sends your data to the cloud.
We built Sense to be the exception.