hkproj / rlhf-ppoLinks
Notes and commented code for RLHF (PPO)
☆110Updated last year
Alternatives and similar repositories for rlhf-ppo
Users that are interested in rlhf-ppo are comparing it to the libraries listed below
Sorting:
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆66Updated 6 months ago
- Minimal hackable GRPO implementation☆286Updated 8 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆120Updated 4 months ago
- ☆96Updated last year
- minimal GRPO implementation from scratch☆98Updated 6 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆261Updated 4 months ago
- Direct Preference Optimization from scratch in PyTorch☆112Updated 5 months ago
- LLaMA 2 implemented from scratch in PyTorch☆353Updated 2 years ago
- A project to improve skills of large language models☆568Updated last week
- Tina: Tiny Reasoning Models via LoRA☆284Updated last week
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆211Updated 2 years ago
- ☆370Updated 9 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆362Updated last year
- Large Reasoning Models☆804Updated 10 months ago
- ☆318Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆107Updated 2 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆217Updated 2 months ago
- Official repository for ORPO☆464Updated last year
- ☆129Updated last year
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆472Updated 3 weeks ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆129Updated 7 months ago
- ☆297Updated 4 months ago
- ☆341Updated 4 months ago
- [MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.☆319Updated last month
- Survey of Small Language Models from Penn State, ...☆202Updated last month
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆276Updated this week
- Code for the paper: "Learning to Reason without External Rewards"☆357Updated 2 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆258Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆256Updated 4 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆91Updated 6 months ago