This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
☆277Apr 20, 2026Updated 2 weeks ago
Alternatives and similar repositories for presidio-research
Users that are interested in presidio-research are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆49Jun 2, 2019Updated 6 years ago
- Annotated corpus + evaluation metrics for text anonymisation☆74Jan 19, 2026Updated 3 months ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆98Feb 15, 2026Updated 2 months ago
- Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub☆340Jan 5, 2024Updated 2 years ago
- A project to build a machine learning pipeline to detect personal identifiable information (PII)☆16Dec 8, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Robust de-identification of medical notes using transformer architectures☆59Jun 27, 2022Updated 3 years ago
- Research simulation toolkit for federated learning☆13Nov 7, 2020Updated 5 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆14Jun 28, 2023Updated 2 years ago
- Finds linguistic patterns effortlessly☆39Aug 29, 2023Updated 2 years ago
- Library for identification, anonymization and de-anonymization of PII data☆22Dec 26, 2022Updated 3 years ago
- Generate reports for spaCy models.☆29May 27, 2022Updated 3 years ago
- ☆12Jun 25, 2024Updated last year
- The code of EMNLP 2019 paper "A Split-and-Recombine Approach for Follow-up Query Analysis"☆18Jul 20, 2023Updated 2 years ago
- SpanMarker for Named Entity Recognition☆472Apr 10, 2026Updated 3 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆10Jul 12, 2023Updated 2 years ago
- This is the implementation of the TextNAS algorithm proposed in the paper TextNAS: A Neural Architecture Search Space tailored for Text R…☆15Nov 28, 2022Updated 3 years ago
- tempeh is a framework to TEst Machine learning PErformance exHaustively which includes tracking memory usage and run time.☆18Jan 3, 2022Updated 4 years ago
- Microsoft Cognitive Services, Computer Vision API, OCR Visualizer on documents☆19Dec 8, 2022Updated 3 years ago
- Serverless Orchestrator of Serverless Workers for AWS - Worker☆17Apr 14, 2026Updated 3 weeks ago
- The note taking app that doesn't suck☆16Apr 27, 2026Updated last week
- Capstone project for Galvanize - Data Science Immersive. 'Project Plotline' looks at the emotional content of movie scripts (web scraping…☆16Sep 27, 2016Updated 9 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python. ➡️ The project has moved to: https://gitlab.opencode…☆21Mar 20, 2026Updated last month
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Sep 2, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A Python interface for NIH Reporter APIs☆12Feb 4, 2025Updated last year
- Public runnable examples of using John Snow Labs' OCR for Apache Spark.☆93Apr 8, 2026Updated 3 weeks ago
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆213Mar 12, 2026Updated last month
- A spaCy wrapper for GliNER☆134Jan 29, 2025Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)☆3,132Updated this week
- This sample project shows off how to prepare and deploy to Azure Web Apps a simple Python web service with an image classifying model pro…☆26Feb 5, 2018Updated 8 years ago
- Demonstrate samples and good engineering practice for operationalizing machine learning solutions.☆20Dec 2, 2021Updated 4 years ago
- 🧪 Cutting-edge experimental spaCy components and features☆105Apr 23, 2024Updated 2 years ago
- CLK hash: hash pii for entity matching☆47May 12, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of lang…☆1,573Jun 12, 2025Updated 10 months ago
- Live survey of off-the-shelf language identification tools for python☆27Apr 13, 2022Updated 4 years ago
- spaCy pipeline object for negating concepts in text☆282Apr 20, 2026Updated 2 weeks ago
- Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️☆18Apr 21, 2026Updated 2 weeks ago
- ☆25May 30, 2025Updated 11 months ago
- Dataframe Integration with spaCy.☆103Mar 12, 2021Updated 5 years ago
- Solution accelerator built on Azure OpenAI Service and Azure AI Document Intelligence to process and extract summaries, entities, and met…☆213Apr 30, 2026Updated last week