VIDA-NYU / opencleanLinks
openclean - Data Cleaning and data profiling library for Python
☆79Updated 3 years ago
Alternatives and similar repositories for openclean
Users that are interested in openclean are comparing it to the libraries listed below
Sorting:
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- Playground for using large language models into the Modern Data Stack for entity matching☆107Updated 2 years ago
- TigerLily: Finding drug interactions in silico with the Graph.☆100Updated 2 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆106Updated 2 years ago
- Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptio…☆40Updated last year
- A library of Reversible Data Transforms☆127Updated last week
- Type System for Data Analysis in Python☆212Updated 4 months ago
- Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.☆77Updated last year
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 5 months ago
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.☆30Updated 3 years ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆54Updated 9 months ago
- An open source automl library for using machine learning in healthcare.☆118Updated last year
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Python package for deduplication/entity resolution using active learning☆80Updated 9 months ago
- Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.☆130Updated last year
- Explore and compare 1K+ accurate decision trees in your browser!☆162Updated last year
- real-time data + ML pipeline☆54Updated this week
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- this repo might get accepted☆28Updated 4 years ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and …☆39Updated 2 years ago
- Identify bias and measure fairness of your data☆92Updated 2 months ago
- ☆96Updated 5 years ago
- Missing data amputation and exploration functions for Python☆70Updated 2 years ago
- Confusion Matrix in Python: plot a pretty confusion matrix (like Matlab) in python using seaborn and matplotlib☆19Updated 3 years ago
- 🍦 Deployment tool for online machine learning models☆97Updated 3 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- automatic data slicing☆34Updated 3 years ago