schelterlabs / jengaLinks
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆41Updated 2 years ago
Alternatives and similar repositories for jenga
Users that are interested in jenga are comparing it to the libraries listed below
Sorting:
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆106Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆49Updated last year
- automatic data slicing☆34Updated 4 years ago
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆220Updated 2 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆76Updated 2 years ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆156Updated 2 years ago
- Distribution transparent Machine Learning experiments on Apache Spark☆91Updated last year
- SPEAR: Programmatically label and build training data quickly.☆108Updated last year
- A library of Reversible Data Transforms☆126Updated this week
- Flow with FlorDB 🌻☆154Updated last week
- ☆22Updated last year
- ☆104Updated 11 months ago
- openclean - Data Cleaning and data profiling library for Python☆80Updated 3 years ago
- ☆32Updated 4 years ago
- ☆29Updated 3 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago
- Train Gradient Boosting models that are both high-performance *and* Fair!☆105Updated last month
- Editing machine learning models to reflect human knowledge and values☆127Updated last year
- ☆33Updated 4 years ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆245Updated 2 weeks ago
- Extremely simple and fast extreme multi-class and multi-label classifiers.☆70Updated 5 months ago
- Benchmarking synthetic data generation methods.☆276Updated last week
- Clustering for mixed-type data☆99Updated last year
- A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching …☆91Updated 3 months ago
- CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system☆77Updated 2 years ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Coarse-grained lineage and tracing for machine learning pipelines.☆469Updated 2 years ago
- this repo might get accepted☆28Updated 4 years ago
- ☆100Updated last week