superlinear-ai / microGRPOLinks
π A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper
β39Updated 6 months ago
Alternatives and similar repositories for microGRPO
Users that are interested in microGRPO are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)β20Updated last year
- An implementation of PPO in Pytorchβ101Updated last month
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Modelsβ67Updated 8 months ago
- β116Updated 3 weeks ago
- Learn online intrinsic rewards from LLM feedbackβ45Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β183Updated 7 months ago
- Reinforcement learning training framework for entity-gym environments.β17Updated last year
- Drop-in environment replacements that make your RL algorithm train faster.β21Updated last year
- Repository for the paper Stream of Search: Learning to Search in Languageβ152Updated 10 months ago
- CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRLβ120Updated last year
- Simple repository for training small reasoning modelsβ47Updated 10 months ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"β210Updated 2 years ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β120Updated last month
- Skill Design From AI Feedbackβ33Updated 10 months ago
- Pytorch Implementation of MuZero Unplugged for gym environment. This algorithm is capable of supporting a wide range of action and observβ¦β34Updated 6 months ago
- β108Updated last year
- Cost aware hyperparameter tuning algorithmβ176Updated last year
- β130Updated 2 weeks ago
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.β26Updated 10 months ago
- Neuroevolution Benchmark in JAX π¦β42Updated 2 years ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ouβ¦β32Updated last year
- Implementation of Soft Actor Critic and some of its improvements in Pytorchβ60Updated last week
- Fast reinforcement learning π¨β28Updated 5 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"β277Updated last month
- Minimal hackable GRPO implementationβ306Updated 10 months ago
- NeurIPS 2024 tutorial on LLM Inferenceβ47Updated last year
- β160Updated last year
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasksβ36Updated last year
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).β72Updated last year
- Minimal but scalable implementation of large language models in JAXβ35Updated last month