schelterlabs / jenga
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆39Updated last year
Alternatives and similar repositories for jenga:
Users that are interested in jenga are comparing it to the libraries listed below
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆104Updated last year
- automatic data slicing☆35Updated 3 years ago
- A Benchmark for Joint Data Cleaning and Machine Learning☆46Updated 9 months ago
- Data-Centric What-If Analysis for Native Machine Learning Pipelines☆16Updated last year
- ☆32Updated 3 years ago
- SPEAR: Programmatically label and build training data quickly.☆105Updated 9 months ago
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆72Updated 2 years ago
- Editing machine learning models to reflect human knowledge and values☆124Updated last year
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆219Updated last year
- Code and data for Sato https://arxiv.org/abs/1911.06311.☆112Updated last year
- Code to reproduce the results in the paper Supervised Learning on Relational Databases with Graph Neural Networks.☆62Updated 5 years ago
- ☆22Updated last year
- openclean - Data Cleaning and data profiling library for Python☆75Updated 3 years ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- The stream-learn is an open-source Python library for difficult data stream analysis.☆63Updated 3 weeks ago
- Model Agnostic Counterfactual Explanations☆87Updated 2 years ago
- A software package for privacy-preserving generation of a synthetic twin to a given sensitive data set.☆51Updated 7 months ago
- A novel approach for synthesizing tabular data using pretrained large language models☆305Updated 5 months ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆156Updated 2 years ago
- Benchmarking synthetic data generation methods.☆271Updated 2 weeks ago
- Clustering for mixed-type data☆98Updated 8 months ago
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆43Updated 3 years ago
- Official Code Repo for the Paper: "How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions", In NeurIPS 2…☆39Updated 2 years ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆228Updated this week
- ☆20Updated 5 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- PyTorch implementation of parity loss as constraints function to realize the fairness of machine learning.☆73Updated last year
- ☆19Updated 7 months ago
- Train Gradient Boosting models that are both high-performance *and* Fair!☆103Updated 9 months ago