schelterlabs / jenga
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆38Updated last year
Alternatives and similar repositories for jenga:
Users that are interested in jenga are comparing it to the libraries listed below
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- automatic data slicing☆35Updated 3 years ago
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆103Updated 11 months ago
- ☆21Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆46Updated 8 months ago
- Data-Centric What-If Analysis for Native Machine Learning Pipelines☆16Updated last year
- SPEAR: Programmatically label and build training data quickly.☆105Updated 8 months ago
- ☆32Updated 3 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated last year
- openclean - Data Cleaning and data profiling library for Python☆74Updated 3 years ago
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆218Updated last year
- Train Gradient Boosting models that are both high-performance *and* Fair!☆103Updated 8 months ago
- ☆29Updated 3 years ago
- Measuring data importance over ML pipelines using the Shapley value.☆37Updated last month
- Foundation Models for Data Tasks☆102Updated last year
- Editing machine learning models to reflect human knowledge and values☆124Updated last year
- Code and data for Sato https://arxiv.org/abs/1911.06311.☆112Updated last year
- Numba-based version of DimmWitted Gibbs sampler☆45Updated 6 years ago
- Model Agnostic Counterfactual Explanations☆87Updated 2 years ago
- Distribution transparent Machine Learning experiments on Apache Spark☆90Updated last year
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆43Updated 3 years ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆156Updated last year
- This project focuses on DeepER, a deep learning framework for entity resolution (record deduplication). It examines how DeepER performs o…☆46Updated 6 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 3 months ago
- Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning (AISTATS 2022 Oral)☆40Updated 2 years ago
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆72Updated 2 years ago
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.☆30Updated 3 years ago