Open-Source Software, Tutorials, and Research on Data-Centric AI π€
β348Apr 7, 2026Updated last month
Alternatives and similar repositories for awesome-data-centric-ai
Users that are interested in awesome-data-centric-ai are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tutorials for YData's Fabric platformβ36May 12, 2025Updated 11 months ago
- Resources for Data Centric AIβ1,141Dec 13, 2023Updated 2 years ago
- Synthetic data generators for tabular and time-series dataβ1,630Apr 23, 2026Updated 2 weeks ago
- Curated list of open source tooling for data-centric AI on unstructured data.β733Nov 15, 2023Updated 2 years ago
- Data Quality assessment with one line of codeβ456Apr 23, 2026Updated 2 weeks ago
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A curated, but incomplete, list of data-centric AI resources.β1,147Jun 26, 2024Updated last year
- β30Feb 9, 2023Updated 3 years ago
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,534Apr 22, 2026Updated 2 weeks ago
- Standardised Metrics and Methods for Synthetic Tabular Data Evaluationβ37Aug 14, 2024Updated last year
- Fabric SDK to interact with the Fabric platformβ22Mar 4, 2026Updated 2 months ago
- Dvc + Streamlit = β€οΈβ40Oct 27, 2023Updated 2 years ago
- Open Source Data Annotation & Labeling Toolsβ694Apr 7, 2026Updated last month
- A tool to generate stubs for Python packages using numpydoc-format docstrings and monkeytype tracesβ13Mar 14, 2024Updated 2 years ago
- NIST Collaborative Research Cycle on Synthetic Data. Learn about Synthetic Data week by week!β27Jul 13, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A copier template repository for a e2e batch ZenML MLOps pipeline.β12Dec 17, 2025Updated 4 months ago
- Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data β¦β11,448Jan 13, 2026Updated 3 months ago
- β15Jul 16, 2014Updated 11 years ago
- π² A curated list of MLOps projects, tools and resourcesβ187Apr 22, 2024Updated 2 years ago
- The active learning algorithm, mismatch-first farthest-traversal. Implementation and visualization.β12Dec 25, 2021Updated 4 years ago
- Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML vaβ¦β4,012Dec 28, 2025Updated 4 months ago
- Modern development with Python in 2024β12Apr 27, 2026Updated last week
- a catch-all repoβ11Dec 28, 2023Updated 2 years ago
- nannyml: post-deployment data science in pythonβ2,140Jul 12, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β13May 12, 2023Updated 2 years ago
- A starter vault in Obsidian for both work and personal knowledge management, complete with seamless workflows.β15Nov 11, 2025Updated 5 months ago
- ML REPA Library: MLOps and ML Engineering Solutions for Successβ23Jun 26, 2023Updated 2 years ago
- ZenML π: One AI Platform from Pipelines to Agents. https://zenml.io.β5,399Updated this week
- β27Oct 13, 2022Updated 3 years ago
- β12Sep 21, 2023Updated 2 years ago
- Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 π©π½βπ»β480Feb 24, 2025Updated last year
- cleanpy is a CLI tool to remove caches and temporary files related to Python.β19Apr 13, 2026Updated 3 weeks ago
- An open-source data logging library for machine learning models and data pipelines. π Provides visibility into data quality & model perfβ¦β2,816Jan 10, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- β14Aug 18, 2023Updated 2 years ago
- A curated list of awesome resources related to Semantic Searchπ and Semantic Similarity tasks.β362Dec 9, 2025Updated 4 months ago
- A curated list of awesome MLOps toolsβ5,125Apr 29, 2026Updated last week
- A curated list of references for MLOpsβ13,887Nov 21, 2024Updated last year
- This repository aims to map the ecosystem of artificial intelligence guidelines, principles, codes of ethics, standards, regulation and bβ¦β1,431Apr 18, 2026Updated 2 weeks ago
- A tool for quickly adding labels to unlabeled datasetsβ20Jan 12, 2024Updated 2 years ago
- Easy-to-use self-supervised representation learning for industrial AIβ25Feb 23, 2023Updated 3 years ago