superlinear-ai / microGRPOLinks
π A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper
β39Updated 7 months ago
Alternatives and similar repositories for microGRPO
Users that are interested in microGRPO are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)β20Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β184Updated 8 months ago
- Simple repository for training small reasoning modelsβ48Updated 11 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Modelsβ68Updated 9 months ago
- Learn online intrinsic rewards from LLM feedbackβ45Updated last year
- An implementation of PPO in Pytorchβ106Updated 3 weeks ago
- Skill Design From AI Feedbackβ33Updated 11 months ago
- Reinforcement learning training framework for entity-gym environments.β17Updated last year
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"β210Updated 2 years ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β125Updated 2 months ago
- Minimal RLHF implementation built on top of minGPT.β32Updated last year
- Minimal hackable GRPO implementationβ319Updated 11 months ago
- fast + parallel AlphaZero in JAXβ109Updated last year
- CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRLβ122Updated last year
- Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.β134Updated 3 months ago
- Fast reinforcement learning π¨β28Updated 6 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ouβ¦β32Updated last year
- Drop-in environment replacements that make your RL algorithm train faster.β21Updated last year
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.β26Updated 11 months ago
- β110Updated last year
- β116Updated last week
- Simple and efficient pytorch-native transformer training and inference (batched)β79Updated last year
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.β357Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)β141Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Languageβ152Updated 11 months ago
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).β73Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flβ¦β78Updated last year
- A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environmβ¦β43Updated 3 years ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)β107Updated 2 months ago
- β128Updated last year