RLHFlow / RAFTLinks
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.
β37Updated last year
Alternatives and similar repositories for RAFT
Users that are interested in RAFT are comparing it to the libraries listed below
Sorting:
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β115Updated 10 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ73Updated 2 weeks ago
- The official repository of the Omni-MATH benchmark.β88Updated 10 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".β82Updated 9 months ago
- The rule-based evaluation subset and code implementation of Omni-MATHβ23Updated 10 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"β75Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ123Updated last year
- β69Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ69Updated 3 months ago
- GenRM-CoT: Data release for verification rationalesβ67Updated last year
- Directional Preference Alignmentβ57Updated last year
- β58Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β130Updated 7 months ago
- Resources for the Enigmata Project.β72Updated 2 months ago
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shapingβ54Updated 5 months ago
- Repo of paper "Free Process Rewards without Process Labels"β164Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learningβ114Updated 5 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingβ177Updated 3 months ago
- β211Updated 8 months ago
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.β68Updated 11 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ61Updated 10 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ65Updated 7 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]β185Updated 4 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learningβ46Updated 4 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β102Updated last week
- RL Scaling and Test-Time Scaling (ICML'25)β111Updated 9 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)β57Updated 11 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"β174Updated 5 months ago
- β63Updated 4 months ago
- A Sober Look at Language Model Reasoningβ85Updated 2 weeks ago