A Machine Learning System for Data Enrichment.
☆533Jul 20, 2023Updated 2 years ago
Alternatives and similar repositories for holoclean
Users that are interested in holoclean are comparing it to the libraries listed below
Sorting:
- ☆12Jun 1, 2021Updated 4 years ago
- The BART Project: Benchmarking Algorithms for (data) Repairing and Translation☆42Nov 27, 2023Updated 2 years ago
- ☆62Jun 5, 2025Updated 8 months ago
- A system for quickly generating training data with weak supervision☆5,939May 2, 2024Updated last year
- A Tree Search Library for Data Cleaning☆22Feb 15, 2022Updated 4 years ago
- FDX, SIGMOD 2020☆20May 3, 2024Updated last year
- Library for exploring and validating machine learning data☆779Jun 23, 2025Updated 8 months ago
- Always know what to expect from your data.☆11,162Feb 20, 2026Updated last week
- ☆15Mar 6, 2025Updated 11 months ago
- Source code for several Metanome data profiling algorithms☆59May 15, 2023Updated 2 years ago
- ☆193May 29, 2024Updated last year
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…☆4,740Feb 19, 2026Updated last week
- ☆18Dec 3, 2015Updated 10 years ago
- Python package for performing Entity and Text Matching using Deep Learning.☆614Jun 18, 2024Updated last year
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,768Updated this week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,362Feb 10, 2026Updated 2 weeks ago
- An open source python library for automated feature engineering☆7,614Feb 3, 2026Updated 3 weeks ago
- Build, Manage and Deploy AI/ML Systems☆9,863Updated this week
- TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows…☆2,272Sep 29, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,475Feb 5, 2026Updated 3 weeks ago
- Data-Centric Pipelines and Data Versioning☆6,286Feb 3, 2025Updated last year
- The Llunatic Mapping and Cleaning Chase Engine☆37Jan 12, 2024Updated 2 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,583Feb 17, 2026Updated last week
- What's in your data? Extract schema, statistics and entities from datasets☆1,545Sep 26, 2025Updated 5 months ago
- Brushing and linking for big data☆972Dec 2, 2025Updated 3 months ago
- 📚 Parameterize, execute, and analyze notebooks☆6,388Jan 5, 2026Updated last month
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Apr 21, 2023Updated 2 years ago
- Production infrastructure for machine learning at scale☆8,031Jun 12, 2024Updated last year
- 🦉 Data Versioning and ML Experiments☆15,404Updated this week
- An Open Standard for lineage metadata collection☆2,330Updated this week
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,046Feb 21, 2024Updated 2 years ago
- Lime: Explaining the predictions of any machine learning classifier☆12,101Jul 25, 2024Updated last year
- Algorithms for explaining machine learning models☆2,612Oct 17, 2025Updated 4 months ago
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.☆10,049Sep 11, 2025Updated 5 months ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆66Updated this week
- A comprehensive benchmark for data cleaning methods and their impact of ML models☆15Jul 24, 2024Updated last year
- ☆12May 12, 2020Updated 5 years ago
- DeltaPy - Tabular Data Augmentation (by @firmai)☆556Sep 19, 2023Updated 2 years ago
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆1,156Updated this week