HoloClean / holoclean
A Machine Learning System for Data Enrichment.
☆519Updated last year
Alternatives and similar repositories for holoclean:
Users that are interested in holoclean are comparing it to the libraries listed below
- More interactive weak supervision with FlyingSquid☆315Updated 4 years ago
- A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels f…☆504Updated 2 months ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,494Updated 2 months ago
- Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbation…☆164Updated 2 weeks ago
- Flow with FlorDB 🌻☆154Updated 2 weeks ago
- Python package for performing Entity and Text Matching using Deep Learning.☆574Updated 8 months ago
- Library for exploring and validating machine learning data☆768Updated 3 weeks ago
- A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scal…☆237Updated 2 months ago
- ☆188Updated 8 months ago
- Type System for Data Analysis in Python☆210Updated 2 weeks ago
- Python library for building highly effective data science workflows☆948Updated last year
- python automatic data quality check toolkit☆284Updated 4 years ago
- What's in your data? Extract schema, statistics and entities from datasets☆1,458Updated 2 weeks ago
- ☆75Updated last year
- Train and run Pytorch models on Apache Spark.☆340Updated last year
- Coarse-grained lineage and tracing for machine learning pipelines.☆467Updated 2 years ago
- Joblib Apache Spark Backend☆245Updated 6 months ago
- DeltaPy - Tabular Data Augmentation (by @firmai)☆542Updated last year
- Implementation of statistical models to analyze time lagged conversions☆261Updated 8 months ago
- DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)☆204Updated 3 years ago
- Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)☆708Updated last year
- GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs☆1,021Updated this week
- A collection of tutorials for Snorkel☆394Updated 3 months ago
- This is a repo documenting the best practices in PySpark.☆462Updated 2 years ago
- TypeDB-ML is the Machine Learning integrations library for TypeDB☆549Updated last year
- ☆96Updated 4 years ago
- The complete graph data science platform☆139Updated 2 weeks ago
- python library for automated dataset normalization☆113Updated last year
- A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton☆862Updated last year
- Python API for Deequ☆744Updated 4 months ago