open-thought / tiny-grpoLinks

Minimal hackable GRPO implementation

☆294

Alternatives and similar repositories for tiny-grpo

Users that are interested in tiny-grpo are comparing it to the libraries listed below

Sorting:

joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆121Updated 5 months ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆299Updated last month
sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆539Updated this week
knoveleng / open-rs
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆266Updated last week
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆342Updated 10 months ago
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆366Updated 3 months ago
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆136Updated 8 months ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆283Updated last year
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆248Updated 2 months ago
RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆273Updated 8 months ago
SimpleBerry / LLaMA-O1
Large Reasoning Models
☆805Updated 10 months ago
vwxyzjn / summarize_from_feedback_details
☆152Updated 11 months ago
eddycmu / demystify-long-cot
☆323Updated 4 months ago
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆330Updated 11 months ago
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆98Updated 7 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆202Updated 7 months ago
Gen-Verse / ReasonFlux
[NeurIPS 2025 Spotlight] ReasonFlux Series - ReasonFlux, ReasonFlux-PRM and ReasonFlux-Coder
☆492Updated last month
NVIDIA-NeMo / Skills
A project to improve skills of large language models
☆587Updated last week
sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,126Updated 2 months ago
huggingface / Math-Verify
☆971Updated 3 months ago
ADaM-BJTU / O1-CODER
AN O1 REPLICATION FOR CODING
☆336Updated 10 months ago
McGill-NLP / nano-aha-moment
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
☆538Updated 2 weeks ago
THUDM / ReST-MCTS
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
☆673Updated 9 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆291Updated 3 weeks ago
PRIME-RL / TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
☆864Updated last month
NovaSky-AI / SkyRL
SkyRL: A Modular Full-stack RL Library for LLMs
☆1,101Updated this week
facebookresearch / RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
☆295Updated this week
zhentingqi / rStar
☆964Updated 9 months ago
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆193Updated last year
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆448Updated 5 months ago