HoloClean / holoclean
A Machine Learning System for Data Enrichment.
☆518Updated last year
Alternatives and similar repositories for holoclean:
Users that are interested in holoclean are comparing it to the libraries listed below
- A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels f…☆505Updated 3 months ago
- python automatic data quality check toolkit☆283Updated 4 years ago
- Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbation…☆164Updated last month
- DeltaPy - Tabular Data Augmentation (by @firmai)☆544Updated last year
- More interactive weak supervision with FlyingSquid☆315Updated 4 years ago
- Flow with FlorDB 🌻☆154Updated last month
- Data Analysis Baseline Library☆728Updated 2 months ago
- Type System for Data Analysis in Python☆210Updated last month
- Coarse-grained lineage and tracing for machine learning pipelines.☆467Updated 2 years ago
- DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)☆205Updated 3 years ago
- ☆188Updated 9 months ago
- A Benchmark for Joint Data Cleaning and Machine Learning☆46Updated 8 months ago
- 🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)☆140Updated last year
- Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).☆530Updated 5 years ago
- Library for Semi-Automated Data Science☆335Updated 5 months ago
- python library for automated dataset normalization☆113Updated last year
- A graph-based functional API for building complex scikit-learn pipelines.☆591Updated 2 years ago
- A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scal…☆238Updated this week
- Python library for building highly effective data science workflows☆949Updated last year
- openclean - Data Cleaning and data profiling library for Python☆73Updated 3 years ago
- TypeDB-ML is the Machine Learning integrations library for TypeDB☆550Updated last year
- ☆96Updated 4 years ago
- This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural lan…☆594Updated 3 years ago
- Random dataframe and database table generator☆308Updated 3 years ago
- ☆76Updated 2 years ago
- Implementation of statistical models to analyze time lagged conversions☆261Updated 9 months ago
- Resources for Data Science Process management☆204Updated 5 years ago
- Source code/webpage/demos for the What-If Tool☆939Updated 6 months ago
- Feature engineering and machine learning: together at last!☆24Updated 4 years ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,496Updated 3 months ago