Resources for Data Centric AI
☆1,134Dec 13, 2023Updated 2 years ago
Alternatives and similar repositories for data-centric-ai
Users that are interested in data-centric-ai are comparing it to the libraries listed below
Sorting:
- A benchmark of data-centric tasks from across the machine learning lifecycle.☆71Jun 8, 2022Updated 3 years ago
- A curated, but incomplete, list of data-centric AI resources.☆1,141Jun 26, 2024Updated last year
- Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖☆345Feb 10, 2026Updated 3 weeks ago
- A system for quickly generating training data with weak supervision☆5,939May 2, 2024Updated last year
- Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data …☆11,346Jan 13, 2026Updated last month
- More interactive weak supervision with FlyingSquid☆317Sep 1, 2020Updated 5 years ago
- ☆141Oct 30, 2023Updated 2 years ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,884Updated this week
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Sep 2, 2024Updated last year
- Weakly Supervised End-to-End Learning (NeurIPS 2021)☆156Mar 20, 2023Updated 2 years ago
- Robustness Gym is an evaluation toolkit for machine learning.☆445Jun 28, 2022Updated 3 years ago
- Explore and understand your training and validation data.☆852Dec 24, 2024Updated last year
- Data for "Datamodels: Predicting Predictions with Training Data"☆97May 25, 2023Updated 2 years ago
- Data augmentation for NLP☆4,645Jun 24, 2024Updated last year
- A data augmentations library for audio, image, text, and video.☆5,071Feb 13, 2026Updated 3 weeks ago
- An end-to-end implementation of intent prediction with Metaflow and other cool tools☆873Jun 16, 2023Updated 2 years ago
- A modular active learning framework for Python☆2,339Feb 26, 2024Updated 2 years ago
- VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.☆3,295Mar 3, 2024Updated 2 years ago
- FFCV: Fast Forward Computer Vision (and other ML workloads!)☆2,985Jun 16, 2024Updated last year
- (ICML 2021) Mandoline: Model Evaluation under Distribution Shift☆30Jun 14, 2021Updated 4 years ago
- Behavioral "black-box" testing for recommender systems☆470Aug 9, 2023Updated 2 years ago
- Beyond Accuracy: Behavioral Testing of NLP models with CheckList☆2,050Jan 9, 2024Updated 2 years ago
- Version control for machine learning☆1,670Feb 25, 2025Updated last year
- A curated list of references for MLOps☆13,717Nov 21, 2024Updated last year
- SPEAR: Programmatically label and build training data quickly.☆109Jun 27, 2024Updated last year
- Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽💻☆479Feb 24, 2025Updated last year
- A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently…☆108Sep 10, 2024Updated last year
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆786May 19, 2024Updated last year
- Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.☆6,017Updated this week
- Model interpretability and understanding for PyTorch☆5,564Feb 26, 2026Updated last week
- ☆17Nov 30, 2022Updated 3 years ago
- Active Learning for Text Classification in Python☆639Feb 1, 2026Updated last month
- Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging F…☆578Nov 10, 2023Updated 2 years ago
- Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. Fro…☆7,272Feb 27, 2026Updated last week
- DataComp: In search of the next generation of multimodal datasets☆772Apr 28, 2025Updated 10 months ago
- Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains☆1,731Oct 8, 2023Updated 2 years ago
- A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.☆592Jan 26, 2024Updated 2 years ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,915Updated this week
- Curated list of open source tooling for data-centric AI on unstructured data.☆734Nov 15, 2023Updated 2 years ago