schelterlabs / jengaLinks
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆41Updated 2 years ago
Alternatives and similar repositories for jenga
Users that are interested in jenga are comparing it to the libraries listed below
Sorting:
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆105Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆49Updated last year
- A Tree Search Library for Data Cleaning☆22Updated 3 years ago
- ☆22Updated last year
- automatic data slicing☆34Updated 3 years ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆157Updated 2 years ago
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆220Updated 2 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆75Updated 2 years ago
- A library of Reversible Data Transforms☆127Updated last week
- openclean - Data Cleaning and data profiling library for Python☆80Updated 3 years ago
- Benchmarking synthetic data generation methods.☆275Updated this week
- Metrics to evaluate quality and efficacy of synthetic datasets.☆243Updated this week
- Distribution transparent Machine Learning experiments on Apache Spark☆91Updated last year
- ☆32Updated 3 years ago
- Extra functionalities for river☆14Updated last year
- Measuring data importance over ML pipelines using the Shapley value.☆43Updated 2 months ago
- Flow with FlorDB 🌻☆154Updated 2 months ago
- this repo might get accepted☆28Updated 4 years ago
- ☆104Updated 10 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago
- SPEAR: Programmatically label and build training data quickly.☆107Updated last year
- A novel approach for synthesizing tabular data using pretrained large language models☆317Updated last month
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Code and data for Sato https://arxiv.org/abs/1911.06311.☆112Updated last year
- ☆29Updated 3 years ago
- Repository for my master thesis on automated string handling☆16Updated 4 years ago
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆43Updated 3 years ago
- This repository provides data and scripts to use Sherlock, a DL-based model for semantic data type detection: https://sherlock.media.mit.…☆167Updated last year
- CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system☆77Updated 2 years ago