RLHFlow / RAFTLinks

This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.

☆37

Alternatives and similar repositories for RAFT

Users that are interested in RAFT are comparing it to the libraries listed below

Sorting:

hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆115Updated 10 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆73Updated 2 weeks ago
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆88Updated 10 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆23Updated 10 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 5 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated last year
FreedomIntelligence / OVM
☆69Updated last year
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 3 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆67Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆57Updated last year
GAIR-NLP / weak-to-strong-reasoning
☆58Updated last year
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆130Updated 7 months ago
BytedTsinghua-SIA / Enigmata
Resources for the Enigmata Project.
☆72Updated 2 months ago
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆54Updated 5 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆164Updated 7 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆177Updated 3 months ago
GAIR-NLP / LIMR
☆211Updated 8 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆68Updated 11 months ago
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 10 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆65Updated 7 months ago
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆185Updated 4 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆46Updated 4 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆102Updated last week
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆111Updated 9 months ago
thu-ml / Noise-Contrastive-Alignment
Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)
☆57Updated 11 months ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆174Updated 5 months ago
test-time-interaction / TTI
☆63Updated 4 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆85Updated 2 weeks ago