schelterlabs / jenga
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.
☆39Updated last year
Alternatives and similar repositories for jenga:
Users that are interested in jenga are comparing it to the libraries listed below
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- A Benchmark for Joint Data Cleaning and Machine Learning☆46Updated 9 months ago
- Data-Centric What-If Analysis for Native Machine Learning Pipelines☆16Updated last year
- Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift": https://arxiv.org/abs/1810.119…☆104Updated last year
- Code for extracting, parsing and annotating tables from GitTables (https://gittables.github.io).☆43Updated 3 years ago
- automatic data slicing☆35Updated 3 years ago
- ☆22Updated last year
- ☆32Updated 3 years ago
- Foundation Models for Data Tasks☆105Updated last year
- Train Gradient Boosting models that are both high-performance *and* Fair!☆103Updated 9 months ago
- Explaining Inference Queries with Bayesian Optimization☆10Updated 4 years ago
- SPEAR: Programmatically label and build training data quickly.☆105Updated 9 months ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- Characterization of relational table embeddings (VLDB 2024).☆25Updated 9 months ago
- A Tree Search Library for Data Cleaning☆22Updated 3 years ago
- Code for the CIKM 2019 Paper "Fast and Accurate Network Embeddings via Very Sparse Random Projection"☆57Updated 5 years ago
- openclean - Data Cleaning and data profiling library for Python☆75Updated 3 years ago
- A practical Active Learning python package with a strong focus on experiments.☆51Updated 2 years ago
- Data Cleaning for ML under the Certain Prediction Framework☆11Updated 3 years ago
- Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"☆16Updated 3 years ago
- Flow with FlorDB 🌻☆155Updated last month
- Repository for performing Blocking using Deep Learning based on the paper "Deep Learning for Blocking in Entity Matching: A Design Space …☆31Updated last year
- Code for the paper "Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond…☆22Updated 2 years ago
- ☆32Updated 3 years ago
- Applications using Parallel Graph AnalytiX (PGX) from Oracle Labs☆49Updated 2 months ago
- Repository with an overview of the tutorial on Models and Practice of Neural Table Representations and up to date material for the hands-…☆20Updated last year
- Continuous Benchmark of Filtering methods for Entity Resolution☆9Updated 8 months ago
- ☆19Updated 7 months ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- Code to reproduce the results in the paper Supervised Learning on Relational Databases with Graph Neural Networks.☆62Updated 5 years ago