Curated list of open source tooling for data-centric AI on unstructured data.
☆732Nov 15, 2023Updated 2 years ago
Alternatives and similar repositories for awesome-open-data-centric-ai
Users that are interested in awesome-open-data-centric-ai are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Easy-to-use self-supervised representation learning for industrial AI☆25Feb 23, 2023Updated 3 years ago
- A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning☆20,607Jun 4, 2026Updated last week
- Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖☆351Apr 7, 2026Updated 2 months ago
- A curated list of references for MLOps☆13,930Nov 21, 2024Updated last year
- Interactively explore unstructured datasets from your dataframe.☆1,257Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data …☆11,511Jan 13, 2026Updated 5 months ago
- Resources for Data Centric AI☆1,145Dec 13, 2023Updated 2 years ago
- Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML va…☆4,023Dec 28, 2025Updated 5 months ago
- Open Source Data Annotation & Labeling Tools☆706Jun 2, 2026Updated last week
- An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model perf…☆2,820Jan 10, 2025Updated last year
- nannyml: post-deployment data science in python☆2,142Jul 12, 2025Updated 11 months ago
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!☆8,670Jun 3, 2026Updated last week
- fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quali…☆1,858Apr 14, 2026Updated 2 months ago
- Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. Fro…☆7,590May 2, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Algorithms for outlier, adversarial and drift detection☆2,522Dec 11, 2025Updated 6 months ago
- 🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of…☆2,314Jun 1, 2026Updated 2 weeks ago
- [MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models☆16May 5, 2023Updated 3 years ago
- A curated list of awesome MLOps tools☆5,174Apr 29, 2026Updated last month
- ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.☆5,440Updated this week
- Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽💻☆480Feb 24, 2025Updated last year
- Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.☆6,155Updated this week
- A curated, but incomplete, list of data-centric AI resources.☆1,148Jun 26, 2024Updated last year
- Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and train…☆9,168May 21, 2026Updated 3 weeks ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- A scalable & efficient active learning/data selection system for everyone.☆219Jul 8, 2024Updated last year
- Always know what to expect from your data.☆11,556Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,887Updated this week
- Algorithms for explaining machine learning models☆2,628Oct 17, 2025Updated 7 months ago
- A UI designer for constructing AI applications with OpenSearch☆16Updated this week
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, a…☆26,506Updated this week
- The first open Federated Learning framework implemented in C++ and Python.☆521Jun 27, 2024Updated last year
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,996Jun 8, 2026Updated last week
- Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use …☆15Mar 15, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Learn how to design, develop, deploy and iterate on production-grade ML applications.☆3,380Aug 16, 2024Updated last year
- 🦉 Data Versioning and ML Experiments☆15,675Jun 8, 2026Updated last week
- 📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.☆29,737Jul 18, 2024Updated last year
- 🌊 Online machine learning in Python☆5,832Jun 4, 2026Updated last week
- Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, …☆3,224Mar 20, 2025Updated last year
- ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling …☆6,727Updated this week
- An end-to-end implementation of intent prediction with Metaflow and other cool tools☆876Jun 16, 2023Updated 2 years ago