This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
☆268Mar 2, 2026Updated 3 weeks ago
Alternatives and similar repositories for presidio-research
Users that are interested in presidio-research are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data…☆7,314Mar 19, 2026Updated last week
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Jan 7, 2026Updated 2 months ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆98Feb 15, 2026Updated last month
- Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub☆338Jan 5, 2024Updated 2 years ago
- ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.☆34Jul 26, 2020Updated 5 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Robust de-identification of medical notes using transformer architectures☆59Jun 27, 2022Updated 3 years ago
- Research simulation toolkit for federated learning☆13Nov 7, 2020Updated 5 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Jun 28, 2023Updated 2 years ago
- Unofficial Python client for Azure cognitive search☆11Jun 7, 2019Updated 6 years ago
- Library for identification, anonymization and de-anonymization of PII data☆22Dec 26, 2022Updated 3 years ago
- ☆11Jun 25, 2024Updated last year
- Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully custom…☆46Jan 1, 2026Updated 2 months ago
- Accompanying material to the course "Building Recommendation Systems in Azure" on the Microsoft Virtual Academy.☆16Oct 2, 2015Updated 10 years ago
- aicreator for aidata☆14May 17, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A Python module that provides multiple anonymization techniques for text (This is only a prototype) ➡️ The project has moved to: https://…☆26Mar 20, 2026Updated last week
- The code of EMNLP 2019 paper "A Split-and-Recombine Approach for Follow-up Query Analysis"☆18Jul 20, 2023Updated 2 years ago
- In browser active learning and guided search☆17May 6, 2023Updated 2 years ago
- This is the implementation of the TextNAS algorithm proposed in the paper TextNAS: A Neural Architecture Search Space tailored for Text R…☆15Nov 28, 2022Updated 3 years ago
- Microsoft Cognitive Services, Computer Vision API, OCR Visualizer on documents☆19Dec 8, 2022Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.☆258Jul 6, 2024Updated last year
- Capstone project for Galvanize - Data Science Immersive. 'Project Plotline' looks at the emotional content of movie scripts (web scraping…☆16Sep 27, 2016Updated 9 years ago
- Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.☆18Nov 9, 2023Updated 2 years ago
- ☆17Jan 13, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- SpanMarker for Named Entity Recognition☆465Jan 8, 2025Updated last year
- This is a prototype of a multi-lingual suite for named-entity recognition in Python. ➡️ The project has moved to: https://gitlab.opencode…☆21Mar 20, 2026Updated last week
- A demonstration of the paper NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings☆39Sep 13, 2025Updated 6 months ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Sep 2, 2024Updated last year
- A spaCy wrapper for GliNER☆132Jan 29, 2025Updated last year
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆207Mar 12, 2026Updated 2 weeks ago
- 📖The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer…☆10Dec 4, 2020Updated 5 years ago
- ☆14Feb 1, 2021Updated 5 years ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,961Mar 19, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆10Jan 3, 2023Updated 3 years ago
- Clean personally identifiable information from dirty dirty text using spaCy.☆41Sep 1, 2023Updated 2 years ago
- This sample project shows off how to prepare and deploy to Azure Web Apps a simple Python web service with an image classifying model pro…☆26Feb 5, 2018Updated 8 years ago
- Self-Supervision for Named Entity Disambiguation at the Tail☆218Jun 14, 2022Updated 3 years ago
- 🧪 Cutting-edge experimental spaCy components and features☆105Apr 23, 2024Updated last year
- CLK hash: hash pii for entity matching☆48May 12, 2025Updated 10 months ago
- A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of lang…☆1,564Jun 12, 2025Updated 9 months ago