schelterlabs / jenga
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆37Updated last year
Alternatives and similar repositories for jenga:
Users that are interested in jenga are comparing it to the libraries listed below
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated 10 months ago
- A Benchmark for Joint Data Cleaning and Machine Learning☆45Updated 7 months ago
- Data-Centric What-If Analysis for Native Machine Learning Pipelines☆15Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆102Updated 9 months ago
- automatic data slicing☆35Updated 3 years ago
- Model Agnostic Counterfactual Explanations☆87Updated 2 years ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆50Updated 2 years ago
- ☆21Updated last year
- this repo might get accepted☆29Updated 3 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆42Updated 3 years ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆157Updated last year
- Editing machine learning models to reflect human knowledge and values☆123Updated last year
- Data Cleaning for ML under the Certain Prediction Framework☆11Updated 2 years ago
- Foundation Models for Data Tasks☆102Updated last year
- openclean - Data Cleaning and data profiling library for Python☆71Updated 3 years ago
- ☆32Updated 3 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated last year
- Explaining Inference Queries with Bayesian Optimization☆10Updated 3 years ago
- ☆94Updated 4 months ago
- SPEAR: Programmatically label and build training data quickly.☆103Updated 6 months ago
- Train Gradient Boosting models that are both high-performance *and* Fair!☆102Updated 6 months ago
- A library of Reversible Data Transforms☆122Updated this week
- Python Interface of the Scalable Bayesian Rule Lists☆19Updated 4 years ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆220Updated this week
- Official Repository for EvalRS @ KDD 2023: a Rounded Evaluation of Recommender Systems☆30Updated 11 months ago
- CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system☆77Updated 2 years ago
- ☆17Updated 4 months ago
- A Scalable Auto-ML System☆51Updated 2 years ago
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆72Updated 2 years ago