This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
☆266Jan 6, 2026Updated 2 months ago
Alternatives and similar repositories for presidio-research
Users that are interested in presidio-research are comparing it to the libraries listed below
Sorting:
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Jan 7, 2026Updated 2 months ago
- An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data…☆7,068Updated this week
- A CLI for identifying potential Personally Identifiable Information in datasets.☆14Apr 9, 2019Updated 6 years ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆98Feb 15, 2026Updated 2 weeks ago
- Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub☆338Jan 5, 2024Updated 2 years ago
- Finds linguistic patterns effortlessly☆39Aug 29, 2023Updated 2 years ago
- Search for PII in Python☆31Jan 29, 2024Updated 2 years ago
- ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.☆35Jul 26, 2020Updated 5 years ago
- Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.☆18Nov 9, 2023Updated 2 years ago
- ☆13Nov 21, 2025Updated 3 months ago
- Robust de-identification of medical notes using transformer architectures☆58Jun 27, 2022Updated 3 years ago
- Research simulation toolkit for federated learning☆13Nov 7, 2020Updated 5 years ago
- Clean personally identifiable information from dirty dirty text using spaCy.☆41Sep 1, 2023Updated 2 years ago
- Unofficial Python client for Azure cognitive search☆11Jun 7, 2019Updated 6 years ago
- aicreator for aidata☆13May 17, 2023Updated 2 years ago
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆104Aug 13, 2024Updated last year
- ☆28Apr 28, 2024Updated last year
- Software for developing sparse, performant, multitask artificial neural networks☆33Jan 4, 2024Updated 2 years ago
- ☆10Feb 17, 2024Updated 2 years ago
- This is the implementation of the TextNAS algorithm proposed in the paper TextNAS: A Neural Architecture Search Space tailored for Text R…☆15Nov 28, 2022Updated 3 years ago
- ☆13Jan 22, 2025Updated last year
- The code of EMNLP 2019 paper "A Split-and-Recombine Approach for Follow-up Query Analysis"☆18Jul 20, 2023Updated 2 years ago
- Homebrew MCP : Comprehensive brew support for installing, upgrading, searching, and maintaining macOS packages.☆25Jun 23, 2025Updated 8 months ago
- Fuzzy matching and more functionality for spaCy.☆259Jul 6, 2024Updated last year
- Code for the paper "Automated Generation of Hospital Discharge Summaries Using Clinical Guidelines and Large Language Models"☆11May 3, 2024Updated last year
- Apache Spark enhanced with native Kubernetes scheduler back-end☆15Aug 21, 2023Updated 2 years ago
- Self-Supervision for Named Entity Disambiguation at the Tail☆218Jun 14, 2022Updated 3 years ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,865Updated this week
- 🧪 Cutting-edge experimental spaCy components and features☆105Apr 23, 2024Updated last year
- Interface for GenAI-Arena [NeurIPS24]☆17Feb 27, 2024Updated 2 years ago
- An NLP pipeline for COVID-19 surveillance used in the Department of Veterans Affairs Biosurveillance.☆16Oct 20, 2022Updated 3 years ago
- spaCy pipeline object for negating concepts in text☆282Jun 16, 2025Updated 8 months ago
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆206Feb 24, 2026Updated last week
- In this repository we test AutoML approaches for time-series forecasting☆13Aug 2, 2018Updated 7 years ago
- 🚀GUI for training spaCy models☆55May 18, 2021Updated 4 years ago
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆335Apr 25, 2025Updated 10 months ago
- Generate reports for spaCy models.☆29May 27, 2022Updated 3 years ago
- ☆43Feb 11, 2025Updated last year
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Sep 2, 2024Updated last year