joey00072/nanoGRPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/joey00072/nanoGRPO)

joey00072 / nanoGRPO

nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)

☆143

Alternatives and similar repositories for nanoGRPO

Users that are interested in nanoGRPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

machinestein / Zero-Shot-Off-Policy-Learning
View on GitHub
Official Pytorch Implementation of "Zero-Shot Off-Policy Learning" (ICML 2026)
☆25Feb 16, 2026Updated 5 months ago
kmohan321 / Research_Papers
View on GitHub
☆45Mar 31, 2025Updated last year
Howuhh / streaming-drl-jax
View on GitHub
streaming deep reinforcement learning but 4x faster with jax!
☆19Jan 4, 2026Updated 6 months ago
nmonette / NCC-UED
View on GitHub
Official Implementation of `An Optimisation Framework for Unsupervised Environment Design` from RLC 2025
☆17Nov 24, 2025Updated 7 months ago
VatsaDev / NanoPoor
View on GitHub
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Apr 22, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
flowersteam / EAGER
View on GitHub
☆10Oct 11, 2022Updated 3 years ago
cswinter / hyperstate
View on GitHub
Opinionated library for managing hyperparameters and mutable state of machine learning training systems.
☆19Aug 4, 2023Updated 2 years ago
jackfsuia / nanoRLHF
View on GitHub
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆80Feb 19, 2025Updated last year
esraaelelimy / rtus
View on GitHub
Real-Time RTUs
☆12Mar 20, 2026Updated 4 months ago
codingfisch / flashrl
View on GitHub
Fast reinforcement learning 💨
☆29Jul 15, 2025Updated last year
frankroeder / lanro-gym
View on GitHub
OpenAI gym environments for goal-conditioned and language-conditioned reinforcement learning
☆14Jan 27, 2026Updated 5 months ago
SDharashivkar / TrojanVectors
View on GitHub
This repo contains a demo of adversarial strings poisoning vector database and forching specific hallucinations on RAG chatbot.
☆10May 2, 2024Updated 2 years ago
XinJingHao / Actor-Sharer-Learner
View on GitHub
Actor-Sharer-Learner training framework for off-policy DRL algorithms
☆22Dec 29, 2024Updated last year
Motsepe-Jr / AI-research-papers-pseudo-code
View on GitHub
This is a repo covers ai research papers pseudocodes
☆18Jun 20, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,267Aug 27, 2025Updated 10 months ago
open-thought / tiny-grpo
View on GitHub
Minimal hackable GRPO implementation
☆344Jan 31, 2025Updated last year
mengdi-li / internally-rewarded-rl
View on GitHub
[ICML 2023] Code for paper "Internally Rewarded Reinforcement Learning"
☆13Jul 21, 2023Updated 3 years ago
machado-research / AgarCL
View on GitHub
Agar.io for Continual Reinforcement Learning
☆24Jul 24, 2025Updated 11 months ago
CLAIRE-Labo / flash_attention
View on GitHub
A basic pure pytorch implementation of flash attention
☆17Oct 28, 2024Updated last year
HeyuanMingong / llirl
View on GitHub
Code for "LifeLong Incremental Reinforcement Learning (LLIRL)"
☆21Jan 28, 2021Updated 5 years ago
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
tokenbender / avataRL
View on GitHub
rl from zero pretrain, can it be done? yes.
☆295Sep 28, 2025Updated 9 months ago
etherealcomputing / Converge
View on GitHub
A bespoke time‑first language + toolchain for hybrid Neuromorphic - classical systems
☆11Feb 4, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
naivoder / MCTSr
View on GitHub
Monte Carlo Tree Search Self-Refine (MCTSr)
☆21Jul 6, 2024Updated 2 years ago
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,508Aug 3, 2025Updated 11 months ago
IouJenLiu / HTS-RL
View on GitHub
☆21Dec 22, 2020Updated 5 years ago
EmptyJackson / policy-guided-diffusion
View on GitHub
Official implementation of the RLC 2024 paper "Policy-Guided Diffusion"
☆153Jul 19, 2024Updated 2 years ago
Kchu / LABOR-Agent
View on GitHub
Official implementation for the LABOR (LAnguage-model-based Bimanual ORchestration) Agent.
☆23Nov 23, 2024Updated last year
jsikyoon / OCRL
View on GitHub
Object-Centric-Representation Library (OCRL): This repo is to explore OCR on various downstream tasks from supervised learning tasks to R…
☆12Feb 23, 2024Updated 2 years ago
aliang8 / varibad_jax
View on GitHub
☆10Jun 27, 2024Updated 2 years ago
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
aielawady / relic
View on GitHub
☆12Sep 7, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
enjeeneer / zero-shot-rl
View on GitHub
VC-FB and MC-FB algorithms from "Zero-Shot Reinforcement Learning from Low Quality Data" (NeurIPS 2024)
☆29Jan 14, 2025Updated last year
MichaelTMatthews / purejaxgcrl
View on GitHub
GCRL in JAX. Official repository for LEO (ICML 2026).
☆27Jun 20, 2026Updated last month
xf-zhao / Agentic-Skill-Discovery
View on GitHub
Official implementation of Zero-Hero paper
☆31Feb 13, 2025Updated last year
Rhoban / footstepnet_envs
View on GitHub
☆19Jul 4, 2025Updated last year
junmokane / spatially-aware-transformer
View on GitHub
☆10Dec 10, 2024Updated last year
McGill-NLP / nano-aha-moment
View on GitHub
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
☆625Oct 7, 2025Updated 9 months ago
mttga / purejaxql
View on GitHub
Simple single-file baselines for Q-Learning in pure-GPU setting
☆242Nov 24, 2025Updated 7 months ago