microsoft / presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
☆177Updated this week
Related projects ⓘ
Alternatives and complementary repositories for presidio-research
- SpanMarker for Named Entity Recognition☆403Updated 3 months ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆74Updated last year
- Annotated corpus + evaluation metrics for text anonymisation☆51Updated 9 months ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆43Updated 5 years ago
- This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with enti…☆242Updated last year
- 📚 Process PDFs, Word documents and more with spaCy☆104Updated this week
- A Python library to de-identify medical records with state-of-the-art NLP methods.☆120Updated last year
- Robust de-identification of medical notes using transformer architectures☆45Updated 2 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆151Updated last year
- Fuzzy matching and more functionality for spaCy.☆252Updated 4 months ago
- Zero and Few shot named entity & relationships recognition☆349Updated 2 months ago
- SpikeX - SpaCy Pipes for Knowledge Extraction☆398Updated 3 years ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆209Updated 5 months ago
- Explainable Zero-Shot Topic Extraction☆61Updated 3 months ago
- A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profil…☆64Updated 6 months ago
- 🍳 Recipes for the Prodigy, our fully scriptable annotation tool☆480Updated 3 months ago
- A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in …☆416Updated 9 months ago
- ✨ Bootstrap annotation with zero- & few-shot learning via OpenAI GPT-3☆320Updated last year
- All the goto functions you need to handle NLP use-cases, integrated in NLPretext☆139Updated 7 months ago
- Models and Pipelines for the Spark NLP library☆112Updated 3 years ago
- Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…☆254Updated 2 weeks ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆920Updated 2 months ago
- Information extraction from English and German texts based on predicate logic☆135Updated last year
- Evaluation of language models on mono- or multilingual tasks.☆75Updated last week
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- A Python library for calculating a large variety of metrics from text☆315Updated last month
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆118Updated 6 months ago
- Find and fix bugs in natural language machine learning models using adaptive testing.☆182Updated 6 months ago
- Spacy NER annotator using ipywidgets☆121Updated 7 months ago
- BERTje is a Dutch pre-trained BERT model developed at the University of Groningen. (EMNLP Findings 2020) "What’s so special about BERT’s …☆135Updated last year