hkproj / rlhf-ppoLinks
Notes and commented code for RLHF (PPO)
☆96Updated last year
Alternatives and similar repositories for rlhf-ppo
Users that are interested in rlhf-ppo are comparing it to the libraries listed below
Sorting:
- Direct Preference Optimization from scratch in PyTorch☆98Updated 2 months ago
- minimal GRPO implementation from scratch☆90Updated 3 months ago
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆55Updated 2 months ago
- A Comprehensive Survey on Long Context Language Modeling☆152Updated 2 weeks ago
- ☆220Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated 3 months ago
- Minimal hackable GRPO implementation☆247Updated 4 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆105Updated last month
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆130Updated 11 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆318Updated 10 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆238Updated last month
- ☆300Updated 3 weeks ago
- Survey of Small Language Models from Penn State, ...☆183Updated last month
- ☆203Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆219Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆241Updated 2 months ago
- ☆119Updated last month
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆206Updated 2 years ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆222Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆189Updated 3 weeks ago
- ☆102Updated 6 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆84Updated 3 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆141Updated 6 months ago
- ☆125Updated last year
- augmented LLM with self reflection☆126Updated last year
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆171Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.☆152Updated 3 months ago
- ☆114Updated 5 months ago
- Tina: Tiny Reasoning Models via LoRA☆260Updated 3 weeks ago
- ☆288Updated 11 months ago