schelterlabs / jengaLinks
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆40Updated last year
Alternatives and similar repositories for jenga
Users that are interested in jenga are comparing it to the libraries listed below
Sorting:
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆48Updated 11 months ago
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆104Updated last year
- automatic data slicing☆34Updated 3 years ago
- ☆22Updated last year
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆44Updated 3 years ago
- openclean - Data Cleaning and data profiling library for Python☆79Updated 3 years ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- ☆32Updated 3 years ago
- Code for the CIKM 2019 Paper "Fast and Accurate Network Embeddings via Very Sparse Random Projection"☆57Updated 5 years ago
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆72Updated 2 years ago
- ☆11Updated 2 weeks ago
- A library of Reversible Data Transforms☆127Updated 2 weeks ago
- ☆19Updated 9 months ago
- Measuring data importance over ML pipelines using the Shapley value.☆42Updated 2 weeks ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- ☆37Updated 4 years ago
- ☆29Updated 3 years ago
- Extra functionalities for river☆14Updated last year
- SPEAR: Programmatically label and build training data quickly.☆106Updated 11 months ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆236Updated this week
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆157Updated 2 years ago
- ☆103Updated 8 months ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Data Cleaning for ML under the Certain Prediction Framework☆11Updated 3 years ago
- The Data Linter identifies potential issues (lints) in your ML training data.☆88Updated 7 years ago
- Benchmarking synthetic data generation methods.☆274Updated this week
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.☆30Updated 3 years ago
- The Tornado framework, designed and implemented for adaptive online learning and data stream mining in Python.☆130Updated last year
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆220Updated last year