This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
☆273Mar 30, 2026Updated 2 weeks ago
Alternatives and similar repositories for presidio-research
Users that are interested in presidio-research are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data…☆7,557Apr 9, 2026Updated last week
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Jan 7, 2026Updated 3 months ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆49Jun 2, 2019Updated 6 years ago
- A CLI for identifying potential Personally Identifiable Information in datasets.☆14Apr 9, 2019Updated 7 years ago
- Annotated corpus + evaluation metrics for text anonymisation☆72Jan 19, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆98Feb 15, 2026Updated 2 months ago
- ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.☆34Jul 26, 2020Updated 5 years ago
- Robust de-identification of medical notes using transformer architectures☆59Jun 27, 2022Updated 3 years ago
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆104Aug 13, 2024Updated last year
- Research simulation toolkit for federated learning☆13Nov 7, 2020Updated 5 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆14Jun 28, 2023Updated 2 years ago
- Unofficial Python client for Azure cognitive search☆11Jun 7, 2019Updated 6 years ago
- Finds linguistic patterns effortlessly☆39Aug 29, 2023Updated 2 years ago
- Library for identification, anonymization and de-anonymization of PII data☆22Dec 26, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Generate reports for spaCy models.☆29May 27, 2022Updated 3 years ago
- ☆11Jun 25, 2024Updated last year
- Search for PII in Python☆30Jan 29, 2024Updated 2 years ago
- Accompanying material to the course "Building Recommendation Systems in Azure" on the Microsoft Virtual Academy.☆16Oct 2, 2015Updated 10 years ago
- aicreator for aidata☆14May 17, 2023Updated 2 years ago
- A Python module that provides multiple anonymization techniques for text (This is only a prototype) ➡️ The project has moved to: https://…☆26Mar 20, 2026Updated 3 weeks ago
- The code of EMNLP 2019 paper "A Split-and-Recombine Approach for Follow-up Query Analysis"☆18Jul 20, 2023Updated 2 years ago
- SpanMarker for Named Entity Recognition☆467Updated this week
- In browser active learning and guided search☆17May 6, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- PyTorch ObjectDetection Modules and ONNX ops☆18Jun 12, 2023Updated 2 years ago
- ☆10Jul 12, 2023Updated 2 years ago
- tempeh is a framework to TEst Machine learning PErformance exHaustively which includes tracking memory usage and run time.☆18Jan 3, 2022Updated 4 years ago
- Official implementation for "Pruning Randomly Initialized Neural Networks with Iterative Randomization"☆10Oct 5, 2021Updated 4 years ago
- Capstone project for Galvanize - Data Science Immersive. 'Project Plotline' looks at the emotional content of movie scripts (web scraping…☆16Sep 27, 2016Updated 9 years ago
- Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.☆18Nov 9, 2023Updated 2 years ago
- An NLP pipeline for COVID-19 surveillance used in the Department of Veterans Affairs Biosurveillance.☆15Oct 20, 2022Updated 3 years ago
- ☆17Jan 13, 2025Updated last year
- This is a prototype of a multi-lingual suite for named-entity recognition in Python. ➡️ The project has moved to: https://gitlab.opencode…☆21Mar 20, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Sep 2, 2024Updated last year
- Public runnable examples of using John Snow Labs' OCR for Apache Spark.☆93Apr 8, 2026Updated last week
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆209Mar 12, 2026Updated last month
- A spaCy wrapper for GliNER☆134Jan 29, 2025Updated last year
- Language detection using Spacy and Fasttext☆55Dec 17, 2023Updated 2 years ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)☆3,060Mar 31, 2026Updated 2 weeks ago
- Clean personally identifiable information from dirty dirty text using spaCy.☆41Sep 1, 2023Updated 2 years ago