LaureBerti / Learn2Clean
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
☆50Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Learn2Clean
- An abstraction layer for parameter tuning☆36Updated 2 months ago
- Python library to explain Tree Ensemble models (TE) like XGBoost, using a rule list.☆44Updated 6 months ago
- Record matching and entity resolution at scale in Spark☆31Updated last year
- ☆20Updated last year
- openclean - Data Cleaning and data profiling library for Python☆69Updated 3 years ago
- Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptio…☆35Updated last year
- A python library for hierarchical classification compatible with scikit-learn☆114Updated 3 months ago
- A collection of data sets for stream learning.☆32Updated 4 years ago
- Similarity encoding of dirty categorical variables (strings)☆20Updated 5 years ago
- Data Cleaning for ML under the Certain Prediction Framework☆11Updated 2 years ago
- scikit-mine : pattern mining in Python☆72Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆44Updated 5 months ago
- Welcome to Snowman App – a Data Matching Benchmark Platform.☆37Updated last year
- ☆29Updated 3 years ago
- Inspect ML Pipelines in Python in the form of a DAG☆69Updated 8 months ago
- Sketch and LSH Index library for Java, including OPH methods as well as the Lazo method☆13Updated 10 months ago
- The complete graph data science platform☆139Updated last week
- An automated machine learning tool aimed to facilitate AutoML research.☆96Updated 2 months ago
- An easier approach to using and understanding ML models☆20Updated last month
- Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index☆41Updated last year
- ☆15Updated 2 years ago
- Python package for Gower distance☆75Updated 6 months ago
- MinHash implementation in Python☆11Updated 2 months ago
- Unified slicing for all Python data structures.☆36Updated 8 months ago
- Python package for deduplication/entity resolution using active learning☆78Updated 2 months ago
- CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system☆76Updated last year
- Editing machine learning models to reflect human knowledge and values☆123Updated last year
- ☆17Updated last year
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated this week