HoloClean / holoclean
A Machine Learning System for Data Enrichment.
☆518Updated last year
Alternatives and similar repositories for holoclean:
Users that are interested in holoclean are comparing it to the libraries listed below
- A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels f…☆504Updated 4 months ago
- More interactive weak supervision with FlyingSquid☆316Updated 4 years ago
- Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbation…☆164Updated last month
- DeltaPy - Tabular Data Augmentation (by @firmai)☆544Updated last year
- Flow with FlorDB 🌻☆155Updated last month
- ☆188Updated 10 months ago
- Type System for Data Analysis in Python☆211Updated last month
- A Benchmark for Joint Data Cleaning and Machine Learning☆46Updated 9 months ago
- python library for automated dataset normalization☆113Updated last year
- Coarse-grained lineage and tracing for machine learning pipelines.☆467Updated 2 years ago
- A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scal…☆239Updated this week
- Python package for performing Entity and Text Matching using Deep Learning.☆585Updated 9 months ago
- Interpret Community extends Interpret repository with additional interpretability techniques and utility functions to handle real-world d…☆426Updated last month
- python automatic data quality check toolkit☆283Updated 4 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆218Updated 11 months ago
- Resources for tackling record linkage / deduplication / data matching problems☆122Updated last year
- Human-explainable AI.☆518Updated last year
- Python library for building highly effective data science workflows☆950Updated last year
- A model-agnostic visual debugging tool for machine learning☆1,657Updated last month
- A collection of tutorials for Snorkel☆394Updated 4 months ago
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆1,007Updated this week
- The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning wo…☆170Updated last year
- DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)☆204Updated 3 years ago
- Implementation of statistical models to analyze time lagged conversions☆261Updated 10 months ago
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,720Updated 8 months ago
- ☆96Updated 5 years ago
- A library for composing end-to-end tunable machine learning pipelines.☆116Updated last month
- Distributed scikit-learn meta-estimators in PySpark☆284Updated 11 months ago
- Bias Auditing & Fair ML Toolkit☆710Updated last week
- FDX, SIGMOD 2020☆19Updated 10 months ago