joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆103Updated 3 weeks ago
Alternatives and similar repositories for nanoGRPO:
Users that are interested in nanoGRPO are comparing it to the libraries listed below
- minimal GRPO implementation from scratch☆87Updated last month
- ☆109Updated 3 months ago
- A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repositor…☆84Updated 3 months ago
- ☆114Updated 2 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆141Updated 2 weeks ago
- ☆138Updated 5 months ago
- ☆97Updated 10 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆85Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆138Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 7 months ago
- ☆92Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆215Updated this week
- ☆57Updated 9 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆28Updated 2 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- This is the official repository for Inheritune.☆111Updated 2 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆100Updated this week
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆53Updated 9 months ago
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆85Updated 2 years ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆158Updated 10 months ago
- ☆78Updated 8 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆226Updated this week
- Minimal hackable GRPO implementation☆217Updated 3 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆51Updated 2 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated last month
- Exploring Applications of GRPO☆189Updated last week
- EvaByte: Efficient Byte-level Language Models at Scale☆91Updated 2 weeks ago
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated 11 months ago