thoughtworks-datakind / anonymizer
Library for identification, anonymization and de-anonymization of PII data
☆22Updated last year
Related projects ⓘ
Alternatives and complementary repositories for anonymizer
- How to do data science with Optimus, Spark and Python.☆18Updated 5 years ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆43Updated 5 years ago
- Apache NiFi NLP Processor☆18Updated last year
- Data Lineage Tracing Library☆22Updated 2 years ago
- ☆25Updated 5 years ago
- ☀️🦶 A lightweight framework for collaborative, open-source feature engineering☆32Updated 3 years ago
- 💻 CLI for reporting events to Faros platform☆14Updated last month
- Synthetic data generation for graph ML experiments☆23Updated 3 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Python ELT Studio, an application for building ELT (and ETL) data flows.☆57Updated 2 years ago
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆43Updated 3 years ago
- Python bindings for Matroid API☆16Updated last month
- ☆21Updated 8 years ago
- ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.☆34Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆31Updated last year
- An open source python library for automated prediction engineering☆46Updated this week
- plait.py - a fake data modeler☆431Updated 5 years ago
- DICOM handling for NiFi☆12Updated last week
- Build a semantic search application with deep learning models.☆13Updated last year
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Updated 3 years ago
- Model drift detection☆11Updated last year
- A few end to end examples that use data-describe☆16Updated last year
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆19Updated 2 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆16Updated 2 years ago
- A Scalable Data Cleaning Library for PySpark.☆26Updated 5 years ago
- Spark NLP for Streamlit☆15Updated 3 years ago
- Apache NiFi Custom Processor for working with Stanford CoreNLP for Sentiment Analysis in Java 8☆11Updated 6 years ago
- Generating Realistic Synthetic Data☆31Updated 9 months ago
- Machine Learning Deployment for Kubernetes☆18Updated 11 months ago