VIDA-NYU / opencleanLinks
openclean - Data Cleaning and data profiling library for Python
☆79Updated 3 years ago
Alternatives and similar repositories for openclean
Users that are interested in openclean are comparing it to the libraries listed below
Sorting:
- SPEAR: Programmatically label and build training data quickly.☆107Updated 11 months ago
- Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptio…☆40Updated 2 years ago
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆107Updated 2 years ago
- A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profil…☆78Updated last year
- DataFrame support for scikit-learn.☆63Updated last year
- Record matching and entity resolution at scale in Spark☆34Updated last year
- real-time data + ML pipeline☆54Updated this week
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- A library of Reversible Data Transforms☆127Updated this week
- A Kedro plugin that provides pandas dropin replacements for the pandas datasets (e.g modin and cuDF)☆12Updated 4 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- Python package for deduplication/entity resolution using active learning☆80Updated 10 months ago
- Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊☆79Updated 9 months ago
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same…☆29Updated 2 years ago
- Pipeline components that support partial_fit.☆46Updated 11 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago
- 🦫 MLOps for (online) machine learning☆86Updated last year
- TigerLily: Finding drug interactions in silico with the Graph.☆100Updated 2 years ago
- A more flexible alternative to scikit-learn Pipelines☆36Updated last week
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆54Updated 9 months ago
- A general purpose recommender metrics library for fair evaluation.☆280Updated last year
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Sensible multi-core apply function for Pandas☆84Updated last month
- Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.☆77Updated last year
- Playground for using large language models into the Modern Data Stack for entity matching☆108Updated 2 years ago
- 🚕 Self-contained demo using Redpanda, Materialize, River, Redis, and Streamlit to predict taxi trip durations☆46Updated 2 years ago
- ⚓ Eurybia monitors model drift over time and securizes model deployment with data validation☆210Updated 8 months ago
- mercury-graph is a Python library that offers graph analytics capabilities with a technology-agnostic API.☆30Updated 3 months ago