hkproj / rlhf-ppoLinks
Notes and commented code for RLHF (PPO)
☆101Updated last year
Alternatives and similar repositories for rlhf-ppo
Users that are interested in rlhf-ppo are comparing it to the libraries listed below
Sorting:
- ☆90Updated 10 months ago
- Minimal hackable GRPO implementation☆274Updated 6 months ago
- Direct Preference Optimization from scratch in PyTorch☆103Updated 3 months ago
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆61Updated 4 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆248Updated 2 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆113Updated 2 months ago
- minimal GRPO implementation from scratch☆94Updated 4 months ago
- A project to improve skills of large language models☆501Updated this week
- LLaMA 2 implemented from scratch in PyTorch☆343Updated last year
- Survey of Small Language Models from Penn State, ...☆186Updated 2 weeks ago
- ☆309Updated 2 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆100Updated 3 weeks ago
- ☆129Updated last year
- ☆608Updated 3 weeks ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆119Updated 5 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆200Updated last week
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆206Updated 2 years ago
- ☆263Updated last month
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆360Updated 10 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆234Updated 2 months ago
- Tina: Tiny Reasoning Models via LoRA☆274Updated 2 months ago
- ☆258Updated 2 months ago
- ☆166Updated 3 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆132Updated last year
- A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning mate…☆285Updated 5 months ago
- ☆128Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks