schelterlabs / jenga
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆39Updated last year
Alternatives and similar repositories for jenga:
Users that are interested in jenga are comparing it to the libraries listed below
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆104Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆48Updated 10 months ago
- automatic data slicing☆34Updated 3 years ago
- SPEAR: Programmatically label and build training data quickly.☆106Updated 10 months ago
- Data Cleaning for ML under the Certain Prediction Framework☆11Updated 3 years ago
- ☆32Updated 3 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- Editing machine learning models to reflect human knowledge and values☆124Updated last year
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆72Updated 2 years ago
- openclean - Data Cleaning and data profiling library for Python☆78Updated 3 years ago
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆220Updated last year
- Data-Centric What-If Analysis for Native Machine Learning Pipelines☆16Updated last year
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆156Updated 2 years ago
- ☆11Updated 2 weeks ago
- ☆22Updated last year
- Model Agnostic Counterfactual Explanations☆87Updated 2 years ago
- A library of Reversible Data Transforms☆124Updated this week
- ☆98Updated 3 weeks ago
- This project focuses on DeepER, a deep learning framework for entity resolution (record deduplication). It examines how DeepER performs o…☆47Updated 6 years ago
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.☆30Updated 3 years ago
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆44Updated 3 years ago
- Distribution transparent Machine Learning experiments on Apache Spark☆90Updated last year
- Benchmarking synthetic data generation methods.☆273Updated this week
- Python Interface of the Scalable Bayesian Rule Lists☆20Updated 5 years ago
- ☆32Updated 3 years ago
- ☆37Updated 3 years ago
- Measuring data importance over ML pipelines using the Shapley value.☆40Updated 3 months ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆231Updated 3 weeks ago
- ☆29Updated 3 years ago