Resources for Data Centric AI
☆1,145Dec 13, 2023Updated 2 years ago
Alternatives and similar repositories for data-centric-ai
Users that are interested in data-centric-ai are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆70Jun 8, 2022Updated 4 years ago
- A curated, but incomplete, list of data-centric AI resources.☆1,148Jun 26, 2024Updated last year
- Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖☆351Apr 7, 2026Updated 2 months ago
- A system for quickly generating training data with weak supervision☆5,975Jun 8, 2026Updated last week
- Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data …☆11,511Jan 13, 2026Updated 5 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- More interactive weak supervision with FlyingSquid☆315Sep 1, 2020Updated 5 years ago
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆155Mar 20, 2023Updated 3 years ago
- ☆141Oct 30, 2023Updated 2 years ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆925Sep 2, 2024Updated last year
- Explore and understand your training and validation data.☆849Dec 24, 2024Updated last year
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆5,007Updated this week
- Robustness Gym is an evaluation toolkit for machine learning.☆446Jun 28, 2022Updated 3 years ago
- ☆16Nov 30, 2022Updated 3 years ago
- Curated list of open source tooling for data-centric AI on unstructured data.☆732Nov 15, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Lightweight implementations of generative label models for weakly supervised machine learning☆24Apr 4, 2026Updated 2 months ago
- Data for "Datamodels: Predicting Predictions with Training Data"☆96May 25, 2023Updated 3 years ago
- Introduction to Data-Centric AI, MIT IAP 2024 🤖☆108Jun 27, 2025Updated 11 months ago
- Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽💻☆480Feb 24, 2025Updated last year
- Data augmentation for NLP☆4,658Jun 12, 2026Updated last week
- A modular active learning framework for Python☆2,354Feb 26, 2024Updated 2 years ago
- An end-to-end implementation of intent prediction with Metaflow and other cool tools☆876Jun 16, 2023Updated 3 years ago
- Beyond Accuracy: Behavioral Testing of NLP models with CheckList☆2,048Jan 9, 2024Updated 2 years ago
- A data augmentations library for audio, image, text, and video.☆5,085Jun 1, 2026Updated 2 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently…☆108Sep 10, 2024Updated last year
- This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Da…☆497Mar 26, 2024Updated 2 years ago
- Behavioral "black-box" testing for recommender systems☆473Aug 9, 2023Updated 2 years ago
- Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling