open-thought / tiny-grpoLinks
Minimal hackable GRPO implementation
β303Updated 10 months ago
Alternatives and similar repositories for tiny-grpo
Users that are interested in tiny-grpo are comparing it to the libraries listed below
Sorting:
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)β126Updated 6 months ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β576Updated last month
- Tina: Tiny Reasoning Models via LoRAβ309Updated 2 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"β561Updated 2 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"β267Updated last month
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ1,164Updated 3 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β360Updated 11 months ago
- Exploring Applications of GRPOβ249Updated 3 months ago
- minimal GRPO implementation from scratchβ100Updated 8 months ago
- Code for the paper: "Learning to Reason without External Rewards"β380Updated 4 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ283Updated last year
- Notes and commented code for RLHF (PPO)β118Updated last year
- An extension of the nanoGPT repository for training small MOE models.β215Updated 8 months ago
- β100Updated 5 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β340Updated 3 weeks ago
- rl from zero pretrain, can it be done? yes.β281Updated 2 months ago
- β327Updated 6 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.β145Updated 10 months ago
- Large Reasoning Modelsβ807Updated last year
- [NeurIPS 2025] TTRL: Test-Time Reinforcement Learningβ908Updated 2 months ago
- β224Updated last week
- A Gym for Agentic LLMsβ371Updated 3 weeks ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".β276Updated 9 months ago
- A project to improve skills of large language modelsβ628Updated this week
- β1,015Updated 5 months ago
- Deepseek R1 zero tiny version own reproduce on two A100s.β77Updated 10 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningβ319Updated last month
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ180Updated 5 months ago
- β463Updated 3 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleβ383Updated 2 weeks ago