This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
☆294Jun 30, 2026Updated this week
Alternatives and similar repositories for presidio-research
Users that are interested in presidio-research are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data…☆9,714Jun 26, 2026Updated last week
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆47Jan 7, 2026Updated 5 months ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆49Jun 2, 2019Updated 7 years ago
- Annotated corpus + evaluation metrics for text anonymisation☆76Jan 19, 2026Updated 5 months ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆101Feb 15, 2026Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.☆34Jul 26, 2020Updated 5 years ago
- Robust de-identification of medical notes using transformer architectures☆62Jun 27, 2022Updated 4 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆14Jun 28, 2023Updated 3 years ago
- Unofficial Python client for Azure cognitive search☆11Jun 7, 2019Updated 7 years ago
- Finds linguistic patterns effortlessly☆40Aug 29, 2023Updated 2 years ago
- Library for identification, anonymization and de-anonymization of PII data☆22Dec 26, 2022Updated 3 years ago
- An AI-powered Personal Identifiable Information (PII) scanner.☆732Jan 22, 2025Updated last year
- ☆12Jun 25, 2024Updated 2 years ago
- Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully custom…☆46Jan 1, 2026Updated 6 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Accompanying material to the course "Building Recommendation Systems in Azure" on the Microsoft Virtual Academy.☆16Oct 2, 2015Updated 10 years ago
- aicreator for aidata☆14May 17, 2023Updated 3 years ago
- A Python module that provides multiple anonymization techniques for text (This is only a prototype) ➡️ The project has moved to: https://…☆26Mar 20, 2026Updated 3 months ago
- SpanMarker for Named Entity Recognition☆476Apr 10, 2026Updated 2 months ago
- Knowledge Extraction For Forms Accelerators & Examples☆223Jul 9, 2024Updated last year
- In browser active learning and guided search☆17May 6, 2023Updated 3 years ago
- PyTorch ObjectDetection Modules and ONNX ops☆18Jun 12, 2023Updated 3 years ago
- ☆10Jul 12, 2023Updated 2 years ago
- Clean personally identifiable information from dirty dirty text.☆430Sep 1, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- tempeh is a framework to TEst Machine learning PErformance exHaustively which includes tracking memory usage and run time.☆18Jan 3, 2022Updated 4 years ago
- Fuzzy matching and more functionality for spaCy.☆258Jul 6, 2024Updated last year
- Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.☆19Nov 9, 2023Updated 2 years ago
- An NLP pipeline for COVID-19 surveillance used in the Department of Veterans Affairs Biosurveillance.☆15Oct 20, 2022Updated 3 years ago
- A Python interface for NIH Reporter APIs☆12Feb 4, 2025Updated last year
- A spaCy wrapper for GliNER☆135Jan 29, 2025Updated last year
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆219Mar 12, 2026Updated 3 months ago
- Language detection using Spacy and Fasttext☆54Dec 17, 2023Updated 2 years ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)☆3,341Jun 16, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Clean personally identifiable information from dirty dirty text using spaCy.☆41Sep 1, 2023Updated 2 years ago
- This sample project shows off how to prepare and deploy to Azure Web Apps a simple Python web service with an image classifying model pro…☆27Feb 5, 2018Updated 8 years ago
- Demonstrate samples and good engineering practice for operationalizing machine learning solutions.☆20Dec 2, 2021Updated 4 years ago
- Self-Supervision for Named Entity Disambiguation at the Tail☆218Jun 14, 2022Updated 4 years ago
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆342Apr 25, 2025Updated last year
- CLK hash: hash pii for entity matching☆47May 12, 2025Updated last year
- A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of lang…☆1,573Updated this week