HoloClean/holoclean

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HoloClean/holoclean)

HoloClean / holoclean

A Machine Learning System for Data Enrichment.

☆539

Alternatives and similar repositories for holoclean

Users that are interested in holoclean are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BigDaMa / ExampleDrivenErrorDetection
View on GitHub
☆12Jun 1, 2021Updated 5 years ago
dbunibas / BART
View on GitHub
The BART Project: Benchmarking Algorithms for (data) Repairing and Translation
☆43Nov 27, 2023Updated 2 years ago
BigDaMa / raha
View on GitHub
☆68Jun 23, 2026Updated 3 weeks ago
sis-ethz / Profiler-Public
View on GitHub
FDX, SIGMOD 2020
☆20May 3, 2024Updated 2 years ago
sjyk / alphaclean
View on GitHub
A Tree Search Library for Data Cleaning
☆22Feb 15, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
WelkinNi / Automatic-Data-Repair
View on GitHub
☆15Mar 6, 2025Updated last year
snorkel-team / snorkel
View on GitHub
A system for quickly generating training data with weak supervision
☆5,992Jun 8, 2026Updated last month
maropu / spark-data-repair-plugin
View on GitHub
Provide functionality to build statistical models to repair dirty tabular data in Spark
☆12Apr 21, 2023Updated 3 years ago
HoloClean / HoloClean-Legacy-deprecated
View on GitHub
A Machine Learning System for Data Enrichment.
☆76Sep 15, 2018Updated 7 years ago
mohamedyd / rein-benchmark
View on GitHub
A comprehensive benchmark for data cleaning methods and their impact of ML models
☆16Jul 24, 2024Updated last year
HPI-Information-Systems / metanome-algorithms
View on GitHub
Source code for several Metanome data profiling algorithms
☆58May 15, 2023Updated 3 years ago
donatellosantoro / Llunatic
View on GitHub
The Llunatic Mapping and Cleaning Chase Engine
☆38Jan 12, 2024Updated 2 years ago
tensorflow / data-validation
View on GitHub
Library for exploring and validating machine learning data
☆782Jun 11, 2026Updated last month
anhaidgroup / py_entitymatching
View on GitHub
☆193May 29, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
j-r77 / cfddiscovery
View on GitHub
☆11Oct 31, 2019Updated 6 years ago
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,660Updated this week
anhaidgroup / deepmatcher
View on GitHub
Python package for performing Entity and Text Matching using Deep Learning.
☆621Jun 18, 2024Updated 2 years ago
amundsen-io / amundsen
View on GitHub
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…
☆4,782Jul 1, 2026Updated 3 weeks ago
sjyk / datacleaning-benchmark
View on GitHub
☆40Aug 31, 2016Updated 9 years ago
J535D165 / recordlinkage
View on GitHub
A powerful and modular toolkit for record linkage and duplicate detection in Python
☆1,055Feb 21, 2024Updated 2 years ago
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,930Updated this week
modin-project / modin
View on GitHub
Modin: Scale your Pandas workflows by changing a single line of code
☆10,395Feb 10, 2026Updated 5 months ago
BenevolentAI / RELVM
View on GitHub
This repository contains the code accompanying the paper "Learning Informative Representations of Biomedical Relations with Latent Variab…
☆15Sep 28, 2021Updated 4 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
HPI-Information-Systems / Metanome
View on GitHub
The source repository of the Metanome tool
☆192Jun 5, 2025Updated last year
weehyong / MLScore
View on GitHub
Know your ML Score based on Sculley's paper
☆34Apr 22, 2019Updated 7 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,636Updated this week
alteryx / featuretools
View on GitHub
An open source python library for automated feature engineering
☆7,665Updated this week
OpenLineage / OpenLineage
View on GitHub
An Open Standard for lineage metadata collection
☆2,555Updated this week
Netflix / metaflow
View on GitHub
Build, Manage and Deploy AI/ML Systems
☆10,190Updated this week
vega / falcon
View on GitHub
Brushing and linking for big data
☆973Jul 2, 2026Updated 2 weeks ago
robustness-gym / robustness-gym
View on GitHub
Robustness Gym is an evaluation toolkit for machine learning.
☆446Jun 28, 2022Updated 4 years ago
cortexlabs / cortex
View on GitHub
Production infrastructure for machine learning at scale
☆8,012Jun 12, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
SFIG611 / tabbie
View on GitHub
☆60Aug 17, 2022Updated 3 years ago
megagonlabs / sato
View on GitHub
Code and data for Sato https://arxiv.org/abs/1911.06311.
☆118Feb 23, 2024Updated 2 years ago
treeverse / dvc
View on GitHub
🦉 Data Versioning and ML Experiments
☆15,768Updated this week
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,509Apr 1, 2026Updated 3 months ago
PasaLab / DIFER
View on GitHub
☆15May 26, 2022Updated 4 years ago
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,459Jul 6, 2026Updated 2 weeks ago
pachyderm / pachyderm
View on GitHub
Data-Centric Pipelines and Data Versioning
☆6,297Feb 3, 2025Updated last year