daochenzha/data-centric-AI

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/daochenzha/data-centric-AI)

daochenzha / data-centric-AI

A curated, but incomplete, list of data-centric AI resources.

☆1,152

Alternatives and similar repositories for data-centric-AI

Users that are interested in data-centric-AI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HazyResearch / data-centric-ai
View on GitHub
Resources for Data Centric AI
☆1,148Dec 13, 2023Updated 2 years ago
Data-Centric-AI-Community / awesome-data-centric-ai
View on GitHub
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
☆355Jul 13, 2026Updated last week
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,952Jul 2, 2026Updated 2 weeks ago
ynchuang / DiscoverPath
View on GitHub
DiscoverPath, a KG-based re- trieval system designed for biomedical research. This system aims to assist biomedical researchers in dynami…
☆28Oct 25, 2023Updated 2 years ago
ZigeW / data_management_LLM
View on GitHub
Collection of training data management explorations for large language models
☆342Aug 2, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Mooler0410 / LLMsPracticalGuide
View on GitHub
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
☆10,200Apr 8, 2026Updated 3 months ago
daochenzha / neuroshard
View on GitHub
[MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
☆16May 5, 2023Updated 3 years ago
RUCAIBox / LLMSurvey
View on GitHub
The official GitHub page for the survey paper "A Survey of Large Language Models".
☆12,191Mar 11, 2025Updated last year
JunweiLiang / awesome_lists
View on GitHub
Awesome Lists for Tenure-Track Assistant Professors and PhD students. (助理教授/博士生生存指南)
☆1,634Feb 1, 2024Updated 2 years ago
dqxiu / ICL_PaperList
View on GitHub
Paper List for In-context Learning 🌷
☆876Oct 8, 2024Updated last year
Guang000 / Awesome-Dataset-Distillation
View on GitHub
A curated list of awesome papers on dataset distillation and related applications.
☆1,964Updated this week
dcai-course / dcai-lab
View on GitHub
Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻
☆483Feb 24, 2025Updated last year
yfzhang114 / Generalization-Causality
View on GitHub
关于domain generalization，domain adaptation，causality，robutness，prompt，optimization，generative model各式各样研究的阅读笔记
☆1,241Dec 14, 2023Updated 2 years ago
mlcommons / dataperf
View on GitHub
Data Benchmarking
☆25May 24, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
subeeshvasu / Awesome-Learning-with-Label-Noise
View on GitHub
A curated list of resources for Learning with Noisy Labels
☆2,716May 3, 2025Updated last year
daochenzha / Meta-AAD
View on GitHub
[ICDM 2020] Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning
☆56Mar 20, 2023Updated 3 years ago
MLNLP-World / Paper-Writing-Tips
View on GitHub
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
☆4,566May 29, 2022Updated 4 years ago
cleanlab / cleanlab
View on GitHub
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data …
☆11,582Jan 13, 2026Updated 6 months ago
VainF / Awesome-Anything
View on GitHub
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
☆1,852Nov 15, 2023Updated 2 years ago
SinclairCoder / Instruction-Tuning-Papers
View on GitHub
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
☆769Jul 20, 2023Updated 3 years ago
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,415Jul 14, 2026Updated last week
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,777Aug 4, 2024Updated last year
txsun1997 / LMaaS-Papers
View on GitHub
Awesome papers on Language-Model-as-a-Service (LMaaS)
☆545May 14, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,253Jun 2, 2026Updated last month
Timothyxxx / Chain-of-ThoughtsPapers
View on GitHub
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
☆2,106Oct 5, 2023Updated 2 years ago
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,828Jul 14, 2026Updated last week
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,916Mar 14, 2024Updated 2 years ago
thunlp / PromptPapers
View on GitHub
Must-read papers on prompt-based tuning for pre-trained language models.
☆4,320Jul 17, 2023Updated 3 years ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,892Updated this week
Eurus-Holmes / Awesome-Multimodal-Research
View on GitHub
A curated list of Multimodal Related Research.
☆1,394Aug 5, 2023Updated 2 years ago
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,497Updated this week
kaixindelele / ChatPaper
View on GitHub
Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
☆19,695Mar 2, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,930Aug 12, 2024Updated last year
daochenzha / dreamshard
View on GitHub
[NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems
☆28Mar 24, 2023Updated 3 years ago
magic-research / Dataset_Quantization
View on GitHub
[ICCV2023] Dataset Quantization
☆261Jan 6, 2024Updated 2 years ago
zhoubolei / bolei_awesome_posters
View on GitHub
CVPR and NeurIPS poster examples and templates
☆2,020May 9, 2023Updated 3 years ago
wasiahmad / Awesome-LLM-Synthetic-Data
View on GitHub
A reading list on LLM based Synthetic Data Generation 🔥
☆1,542Jun 5, 2025Updated last year
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,494May 1, 2026Updated 2 months ago
HillZhang1999 / llm-hallucination-survey
View on GitHub
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …
☆1,085Sep 27, 2025Updated 9 months ago