shizhediao/Post-Training-Data-Flywheel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shizhediao/Post-Training-Data-Flywheel)

shizhediao / Post-Training-Data-Flywheel

We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.

☆66

Alternatives and similar repositories for Post-Training-Data-Flywheel

Users that are interested in Post-Training-Data-Flywheel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hanningzhang / prm
View on GitHub
☆17Nov 3, 2024Updated last year
pipilurj / MLLM-protector
View on GitHub
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
☆46Apr 21, 2024Updated 2 years ago
KodCode-AI / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆13Apr 9, 2025Updated last year
riejohnson / cfg-gan
View on GitHub
CFG-GAN: Composite functional gradient learning of generative adversarial models
☆15Jul 9, 2020Updated 6 years ago
2003pro / ScaleBiO
View on GitHub
This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
☆25Jul 30, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
pipilurj / ROBOT
View on GitHub
☆27Apr 11, 2023Updated 3 years ago
fanqiwan / Explore-Instruct
View on GitHub
EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration
☆36Mar 10, 2024Updated 2 years ago
quanshr / AugCon
View on GitHub
[AAAI 2025]Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity
☆30Mar 17, 2025Updated last year
RLHFlow / Directional-Preference-Alignment
View on GitHub
Directional Preference Alignment
☆62Sep 23, 2024Updated last year
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
yhao-wang / LLM-Knowledge-Boundary
View on GitHub
Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"
☆21Jul 31, 2023Updated 2 years ago
hendrydong / NTK-and-MF-examples
View on GitHub
Visualization of mean field and neural tangent kernel regime
☆23Jul 25, 2024Updated 2 years ago
RLHFlow / Online-DPO-R1
View on GitHub
Codebase for Iterative DPO Using Rule-based Rewards
☆275Apr 11, 2025Updated last year
icip-cas / EntityMatcher
View on GitHub
☆18Jun 17, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
RLHFlow / RAFT
View on GitHub
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆43Sep 22, 2024Updated last year
OpenLMLab / ParallelTokenizer
View on GitHub
Use the tokenizer in parallel to achieve superior acceleration
☆20Mar 21, 2024Updated 2 years ago
RLHFlow / Online-RLHF
View on GitHub
A recipe for online RLHF and online iterative DPO.
☆544Dec 28, 2024Updated last year
shizhediao / automate-cot
View on GitHub
Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"
☆20Feb 24, 2024Updated 2 years ago
shizhediao / active-prompt
View on GitHub
Source code for the paper "Active Prompting with Chain-of-Thought for Large Language Models"
☆249May 7, 2024Updated 2 years ago
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
sustech-nlp / SPPO
View on GitHub
[ACL 2026 Oral] SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks official repos.
☆26May 18, 2026Updated 2 months ago
pipilurj / bootstrapped-preference-optimization-BPO
View on GitHub
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆63Aug 23, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
icip-cas / SSO
View on GitHub
A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…
☆20Nov 21, 2024Updated last year
ZubinGou / math-evaluation-harness
View on GitHub
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆278Apr 26, 2024Updated 2 years ago
yiqingxyq / RepoST
View on GitHub
Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"
☆24Mar 18, 2025Updated last year
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆600Dec 9, 2024Updated last year
kaistAI / InstructIR
View on GitHub
IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…
☆32Jun 13, 2024Updated 2 years ago
LLM360 / MegaMath
View on GitHub
[COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.
☆110Apr 4, 2025Updated last year
yafuly / CoGnition
View on GitHub
☆17Nov 10, 2021Updated 4 years ago
general-preference / general-preference-model
View on GitHub
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)
☆43Jun 15, 2026Updated last month
McGill-NLP / CHASE
View on GitHub
Synthetic Data Generation for Evaluation
☆16Feb 21, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
dxhou / CoAct
View on GitHub
☆32Jul 8, 2024Updated 2 years ago
shizhediao / R-Tuning
View on GitHub
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…
☆137Jul 10, 2024Updated 2 years ago
RUCAIBox / LLM-Knowledge-Boundary
View on GitHub
Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"
☆82Jul 31, 2023Updated 2 years ago
hanqi-qi / Mirror
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
wangitu / Ada-Instruct
View on GitHub
☆17Apr 10, 2024Updated 2 years ago
RLHFlow / RLHF-Reward-Modeling
View on GitHub
Recipes to train reward model for RLHF.
☆1,535Apr 24, 2025Updated last year
BaohaoLiao / frac-cot
View on GitHub
[COLM 2026] An efficient 3D sampling method for long-CoT LLM.
☆16May 25, 2025Updated last year