superlinear-ai / microGRPOLinks
π A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper
β39Updated 7 months ago
Alternatives and similar repositories for microGRPO
Users that are interested in microGRPO are comparing it to the libraries listed below
Sorting:
- Learn online intrinsic rewards from LLM feedbackβ45Updated last year
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)β20Updated last year
- CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRLβ122Updated last year
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"β211Updated 2 years ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Modelsβ68Updated 9 months ago
- Pytorch Implementation of MuZero Unplugged for gym environment. This algorithm is capable of supporting a wide range of action and observβ¦β35Updated 7 months ago
- Reinforcement learning training framework for entity-gym environments.β17Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ouβ¦β32Updated last year
- ICLR 2021: "Monte-Carlo Planning and Learning with Language Action Value Estimates"β33Updated 2 years ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β185Updated 8 months ago
- Interpreting how transformers simulate agents performing RL tasksβ90Updated 2 years ago
- Simple repository for training small reasoning modelsβ49Updated last year
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β130Updated 2 months ago
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDLβ64Updated last year
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).β73Updated last year
- β110Updated last year
- β136Updated 2 months ago
- fast + parallel AlphaZero in JAXβ109Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flβ¦β78Updated last year
- Various reinforcement learning algorithms written in Jax + Flaxβ26Updated 2 years ago
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)β42Updated last year
- A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environmβ¦β43Updated 3 years ago
- Skill Design From AI Feedbackβ33Updated 11 months ago
- Repository for the paper Stream of Search: Learning to Search in Languageβ153Updated last year
- A2C is a special case of PPO!β22Updated 3 years ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.β361Updated this week
- Neuroevolution Benchmark in JAX π¦β42Updated 2 years ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"β83Updated 3 years ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)β143Updated 9 months ago
- β185Updated 2 years ago