joey00072 / nanoGRPOView external linksLinks
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆143May 8, 2025Updated 9 months ago
Alternatives and similar repositories for nanoGRPO
Users that are interested in nanoGRPO are comparing it to the libraries listed below
Sorting:
- A Pytorch Lightning WGAN-gp to generate faces☆11Jan 26, 2021Updated 5 years ago
- minimal GRPO implementation from scratch☆103Mar 14, 2025Updated 11 months ago
- working implimention of deepseek MLA☆45Jan 8, 2025Updated last year
- Official Implementation of `An Optimisation Framework for Unsupervised Environment Design` from RLC 2025☆17Nov 24, 2025Updated 2 months ago
- Fast reinforcement learning 💨☆28Jul 15, 2025Updated 7 months ago
- Opinionated library for managing hyperparameters and mutable state of machine learning training systems.☆19Aug 4, 2023Updated 2 years ago
- Pointax: PointMaze Environment for JAX☆26Oct 22, 2025Updated 3 months ago
- VC-FB and MC-FB algorithms from "Zero-Shot Reinforcement Learning from Low Quality Data" (NeurIPS 2024)☆22Jan 14, 2025Updated last year
- Actor-Sharer-Learner training framework for off-policy DRL algorithms☆22Dec 29, 2024Updated last year
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆79Feb 19, 2025Updated 11 months ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 2 months ago
- Python package for serving a local search engine. One command to download and serve a datastore---that's it 😎.☆25Jun 6, 2025Updated 8 months ago
- Real-Time RTUs☆11Jan 2, 2025Updated last year
- ☆10Oct 11, 2022Updated 3 years ago
- ☆21Dec 22, 2020Updated 5 years ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆14Apr 30, 2025Updated 9 months ago
- Sharpened Cosine Distance implementation in PyTorch☆10Feb 1, 2022Updated 4 years ago
- ☆46Mar 31, 2025Updated 10 months ago
- ☆10Jun 27, 2024Updated last year
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,209Aug 27, 2025Updated 5 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 10 months ago
- ☆11Jun 14, 2019Updated 6 years ago
- [ICML 2023] Code for paper "Internally Rewarded Reinforcement Learning"☆13Jul 21, 2023Updated 2 years ago
- OpenAI gym environments for goal-conditioned and language-conditioned reinforcement learning☆14Jan 27, 2026Updated 3 weeks ago
- Object-Centric-Representation Library (OCRL): This repo is to explore OCR on various downstream tasks from supervised learning tasks to R…☆12Feb 23, 2024Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- Lecture notes for a course on Decision and Game Theory for undergraduates studying AI☆13Dec 14, 2018Updated 7 years ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated last year
- Implementations of a large collection of reinforcement learning algorithms.☆28Nov 30, 2023Updated 2 years ago
- ☆33Jan 19, 2026Updated 3 weeks ago
- ☆16Jul 16, 2024Updated last year
- [ICML 2025 GenBio Workshop] Official Implementation for "Electrostatics from Laplacian Eigenbasis for Neural Network Interatomic Potentia…☆17Jun 12, 2025Updated 8 months ago
- Muon is Scalable for LLM Training☆1,432Aug 3, 2025Updated 6 months ago
- Official release for the code used in paper: Learning from Active Human Involvement through Proxy Value Propagation (NeurIPS 2023 Spotlig …☆33Jan 16, 2025Updated last year
- Codebase for the paper "How Crucial is Transformer in Decision Transformer?". Containing experiments on different pendulum tasks and code…☆28Mar 24, 2023Updated 2 years ago
- Minimal hackable GRPO implementation☆322Jan 31, 2025Updated last year
- A videogame made with PyGame turned into an Open AI Gym Learning Environment for Reinforcement Learning agents.☆15Jan 3, 2023Updated 3 years ago
- ☆12Sep 7, 2024Updated last year
- [ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model☆16Feb 24, 2025Updated 11 months ago