hkproj / rlhf-ppoLinks

Notes and commented code for RLHF (PPO)

☆101

Alternatives and similar repositories for rlhf-ppo

Users that are interested in rlhf-ppo are comparing it to the libraries listed below

Sorting:

neubig / minllama-assignment
☆90Updated 10 months ago
open-thought / tiny-grpo
Minimal hackable GRPO implementation
☆274Updated 6 months ago
0xallam / Direct-Preference-Optimization
Direct Preference Optimization from scratch in PyTorch
☆103Updated 3 months ago
cmu-l3 / anlp-spring2025-code
Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/
☆61Updated 4 months ago
knoveleng / open-rs
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆248Updated 2 months ago
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆113Updated 2 months ago
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆94Updated 4 months ago
NVIDIA / NeMo-Skills
A project to improve skills of large language models
☆501Updated this week
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆343Updated last year
FairyFali / SLMs-Survey
Survey of Small Language Models from Penn State, ...
☆186Updated 2 weeks ago
eddycmu / demystify-long-cot
☆309Updated 2 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆100Updated 3 weeks ago
BrendanGraham14 / mcts-llm
☆129Updated last year
microsoft / rStar
☆608Updated 3 weeks ago
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆119Updated 5 months ago
RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆200Updated last week
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆206Updated 2 years ago
ai-agents-2030 / awesome-deep-research-agent
☆263Updated last month
tianyi-lab / Reflection_Tuning
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
☆360Updated 10 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆234Updated 2 months ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆274Updated 2 months ago
GAIR-NLP / ToRL
☆258Updated 2 months ago
ReTool-RL / ReTool
☆166Updated 3 months ago
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆132Updated last year
samkhur006 / awesome-llm-planning-reasoning
A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning mate…
☆285Updated 5 months ago
DataArcTech / LLM-as-a-Judge
☆128Updated 4 months ago
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆233Updated 3 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆164Updated 4 months ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆268Updated 5 months ago