schelterlabs / jengaLinks
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆41Updated 2 years ago
Alternatives and similar repositories for jenga
Users that are interested in jenga are comparing it to the libraries listed below
Sorting:
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆49Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆107Updated last year
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆221Updated 2 years ago
- automatic data slicing☆34Updated 4 years ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆157Updated 2 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆78Updated 2 years ago
- openclean - Data Cleaning and data profiling library for Python☆82Updated 4 years ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆251Updated last week
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆41Updated 2 years ago
- Distribution transparent Machine Learning experiments on Apache Spark☆91Updated last year
- ☆104Updated last year
- this repo might get accepted☆28Updated 4 years ago
- A library of Reversible Data Transforms☆128Updated this week
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆45Updated 3 years ago
- SPEAR: Programmatically label and build training data quickly.☆109Updated last year
- Benchmarking synthetic data generation methods.☆287Updated this week
- ☆22Updated 2 years ago
- A Tree Search Library for Data Cleaning☆22Updated 3 years ago
- FlorDB 🌻☆155Updated last month
- ☆29Updated 4 years ago
- A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching …☆97Updated last month
- A novel approach for synthesizing tabular data using pretrained large language models☆329Updated 2 weeks ago
- ☆32Updated 4 years ago
- Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbation…☆164Updated 4 months ago
- A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scal…☆249Updated last week
- SparkER: an Entity Resolution framework for Apache Spark☆65Updated last year
- Data-Centric What-If Analysis for Native Machine Learning Pipelines☆16Updated 2 years ago
- Code and data for Sato https://arxiv.org/abs/1911.06311.☆116Updated last year
- ✂️ Fast slice finding for Machine Learning model debugging.☆97Updated last month