jianzhnie/awesome-instruction-datasets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jianzhnie/awesome-instruction-datasets)

jianzhnie / awesome-instruction-datasets

A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。

☆738

Alternatives and similar repositories for awesome-instruction-datasets

Users that are interested in awesome-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yaodongC / awesome-instruction-dataset
View on GitHub
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
☆1,153Jan 4, 2024Updated 2 years ago
zhilizju / Awesome-instruction-tuning
View on GitHub
A curated list of awesome instruction tuning datasets, models, papers and repositories.
☆346Jun 12, 2023Updated 3 years ago
raunak-agarwal / instruction-datasets
View on GitHub
Datasets for Instruction Tuning of Large Language Models
☆261Nov 30, 2023Updated 2 years ago
Zjh-819 / LLMDataHub
View on GitHub
A quick guide (especially) for trending instruction finetuning datasets
☆3,408Nov 28, 2023Updated 2 years ago
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,607Mar 27, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
RenzeLou / awesome-instruction-learning
View on GitHub
Papers and Datasets on Instruction Tuning and Following. ✨✨✨
☆512Apr 4, 2024Updated 2 years ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,443Jul 13, 2026Updated 2 weeks ago
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆600Dec 9, 2024Updated last year
PhoebusSi / Alpaca-CoT
View on GitHub
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tunin…
☆2,791Dec 12, 2023Updated 2 years ago
thunlp / UltraChat
View on GitHub
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
☆2,877Mar 13, 2024Updated 2 years ago
opendilab / awesome-RLHF
View on GitHub
A curated list of reinforcement learning with human feedback resources (continually updated)
☆4,417May 20, 2026Updated 2 months ago
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,651May 26, 2026Updated 2 months ago
Instruction-Tuning-with-GPT-4 / GPT-4-LLM
View on GitHub
Instruction Tuning with GPT-4
☆4,333Jun 11, 2023Updated 3 years ago
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,776Aug 4, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
XueFuzhao / InstructionWild
View on GitHub
☆462Jun 9, 2024Updated 2 years ago
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,855Jul 14, 2026Updated 2 weeks ago
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,811Updated this week
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,953Updated this week
OpenLMLab / MOSS-RLHF
View on GitHub
Secrets of RLHF in Large Language Models Part I: PPO
☆1,426Mar 3, 2024Updated 2 years ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,752Jan 8, 2024Updated 2 years ago
tatsu-lab / alpaca_eval
View on GitHub
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆2,007Aug 9, 2025Updated 11 months ago
allenai / natural-instructions
View on GitHub
Expanding natural instructions
☆1,045Dec 11, 2023Updated 2 years ago
jianzhnie / LLamaTuner
View on GitHub
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
☆620Jan 24, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,853Jun 17, 2025Updated last year
tatsu-lab / stanford_alpaca
View on GitHub
Code and documentation to train Stanford's Alpaca models, and generate the data.
☆30,246Jul 17, 2024Updated 2 years ago
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,241Updated this week
GAIR-NLP / O1-Journey
View on GitHub
O1 Replication Journey
☆2,001Jan 14, 2025Updated last year
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,481Jun 7, 2025Updated last year
google-research / FLAN
View on GitHub
☆1,566Jul 2, 2026Updated 3 weeks ago
hendrycks / test
View on GitHub
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,603May 28, 2023Updated 3 years ago
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,460Updated this week
nickrosh / evol-teacher
View on GitHub
Open Source WizardCoder Dataset
☆166Jul 12, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,505May 1, 2026Updated 2 months ago
salesforce / DialogStudio
View on GitHub
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI
☆526Jun 2, 2026Updated last month
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,266Jun 17, 2026Updated last month
princeton-nlp / LESS
View on GitHub
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆532Oct 20, 2024Updated last year
magpie-align / magpie
View on GitHub
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆875Mar 17, 2025Updated last year
OFA-Sys / InsTag
View on GitHub
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆288Aug 20, 2023Updated 2 years ago
CLUEbenchmark / SuperCLUE-Math6
View on GitHub
SuperCLUE-Math6：新一代中文原生多轮多步数学推理数据集的探索之旅
☆60Feb 5, 2024Updated 2 years ago