hkproj / rlhf-ppoLinks
Notes and commented code for RLHF (PPO)
☆94Updated last year
Alternatives and similar repositories for rlhf-ppo
Users that are interested in rlhf-ppo are comparing it to the libraries listed below
Sorting:
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆53Updated 2 months ago
- ☆87Updated 8 months ago
- Direct Preference Optimization from scratch in PyTorch☆92Updated last month
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆105Updated 3 weeks ago
- ☆198Updated last week
- Minimal hackable GRPO implementation☆232Updated 4 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆182Updated this week
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆136Updated 5 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- augmented LLM with self reflection☆122Updated last year
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆231Updated 3 weeks ago
- ☆140Updated 6 months ago
- Distributed training (multi-node) of a Transformer model☆68Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 3 weeks ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆129Updated 10 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆314Updated 9 months ago
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆170Updated 4 months ago
- ☆124Updated 11 months ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆97Updated 9 months ago
- ☆113Updated 4 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆102Updated last year
- ☆201Updated 3 months ago
- An extension of the nanoGPT repository for training small MOE models.☆147Updated 2 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆151Updated last month
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆177Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆147Updated 2 weeks ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆257Updated this week
- ☆102Updated 5 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆248Updated 2 weeks ago