HoloClean / holoclean
A Machine Learning System for Data Enrichment.
β517Updated last year
Related projects β
Alternatives and complementary repositories for holoclean
- More interactive weak supervision with FlyingSquidβ314Updated 4 years ago
- π» Flow with FlorDBβ151Updated 2 months ago
- Scalable identity resolution, entity resolution, data mastering and deduplication using MLβ957Updated this week
- Luminaire is a python package that provides ML driven solutions for monitoring time series data.β763Updated 9 months ago
- A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels fβ¦β497Updated this week
- A collection of tutorials for Snorkelβ392Updated 3 weeks ago
- Data Analysis Baseline Libraryβ725Updated 3 months ago
- Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbationβ¦β165Updated 2 months ago
- Type System for Data Analysis in Pythonβ208Updated 3 months ago
- Hopsworks - Data-Intensive AI platform with a Feature Storeβ1,162Updated last week
- What's in your data? Extract schema, statistics and entities from datasetsβ1,433Updated 4 months ago
- β74Updated last year
- Interpret Community extends Interpret repository with additional interpretability techniques and utility functions to handle real-world dβ¦β421Updated 5 months ago
- Monitor the stability of a Pandas or Spark dataframe βοΈβ495Updated last month
- A list of free data matching and record linkage software.β362Updated 8 months ago
- A Benchmark for Joint Data Cleaning and Machine Learningβ44Updated 4 months ago
- Algorithms for outlier, adversarial and drift detectionβ2,241Updated 2 weeks ago
- Python library for building highly effective data science workflowsβ952Updated last year
- python automatic data quality check toolkitβ285Updated 4 years ago
- β185Updated 5 months ago
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.ioβ1,906Updated last week
- π³ The stupidly simple CLI workspace for your data warehouse.β724Updated last year
- DeltaPy - Tabular Data Augmentation (by @firmai)β536Updated last year
- Open Source ML Model Versioning, Metadata, and Experiment Managementβ1,700Updated 3 months ago
- Coarse-grained lineage and tracing for machine learning pipelines.β466Updated 2 years ago
- A powerful and modular toolkit for record linkage and duplicate detection in Pythonβ963Updated 8 months ago
- Data ingestion library for Amundsen to build graph and search indexβ206Updated 7 months ago
- Automatically labeling training dataβ106Updated 5 years ago
- Implementation of statistical models to analyze time lagged conversionsβ258Updated 5 months ago