thoughtworks-datakind / anonymizer
Library for identification, anonymization and de-anonymization of PII data
☆21Updated 2 years ago
Alternatives and similar repositories for anonymizer:
Users that are interested in anonymizer are comparing it to the libraries listed below
- How to do data science with Optimus, Spark and Python.☆19Updated 5 years ago
- Data Lineage Tracing Library☆22Updated 3 years ago
- ☆21Updated 8 years ago
- A few end to end examples that use data-describe☆16Updated last year
- A Scalable Data Cleaning Library for PySpark.☆26Updated 5 years ago
- Synthetic data generation for graph ML experiments☆23Updated 4 years ago
- ElasticSearch implementation of MlFlow tracking store☆18Updated 4 years ago
- ☆12Updated 4 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆21Updated 2 years ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Updated 4 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Automated Continuous Data Quality Measurement☆12Updated last year
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆43Updated 5 years ago
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago
- 💻 CLI for reporting events to Faros platform☆14Updated 3 months ago
- ☆13Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Apache NiFi NLP Processor☆18Updated last year
- Machine Learning with Elastic Stack - Second Edition, published by Packt☆15Updated 3 years ago
- Simple template showing how to set up docker for reproducible data science with Jupyter notebooks.☆22Updated 8 months ago
- Python ELT Studio, an application for building ELT (and ETL) data flows.☆57Updated 3 years ago
- This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.☆11Updated 4 years ago
- Streamlit example showing Scikit Learn & Pyspark ML over Healthcare data ! Its simple !!☆30Updated 4 years ago
- ☆10Updated 2 years ago
- 🦖 Streamlined Recommender Systems with TensorFlow and KubeFlow☆18Updated last year
- Generating Realistic Synthetic Data☆33Updated last year
- ☆22Updated 2 years ago
- ☆25Updated 6 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆30Updated 2 years ago