brendanhogan/DeepSeekRL-Extended

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/brendanhogan/DeepSeekRL-Extended)

brendanhogan / DeepSeekRL-Extended

Exploring Applications of GRPO

☆252

Alternatives and similar repositories for DeepSeekRL-Extended

Users that are interested in DeepSeekRL-Extended are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

brendanhogan / picoDeepResearch
View on GitHub
☆69May 23, 2025Updated last year
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
groundlight / r1_vlm
View on GitHub
Build your own visual reasoning model
☆421Jan 13, 2026Updated 6 months ago
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,413Updated this week
rosmineb / unit_test_rl
View on GitHub
Project code for training LLMs to write better unit tests + code
☆22May 19, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
xeophon / beam
View on GitHub
☆16Feb 22, 2026Updated 5 months ago
open-thought / tiny-grpo
View on GitHub
Minimal hackable GRPO implementation
☆344Jan 31, 2025Updated last year
ivanleomk / modal-grpo
View on GitHub
☆19Mar 16, 2025Updated last year
willccbb / agent-engineering
View on GitHub
Agent Engineering course files
☆72Jul 12, 2025Updated last year
bespokelabsai / verifiers
View on GitHub
Verifiers for LLM Reinforcement Learning
☆81Jul 17, 2026Updated last week
usamec / lowmem_finetuning
View on GitHub
Low memory full parameter finetuning of LLMs
☆54Jul 18, 2025Updated last year
Alex-Gurung / ReasoningNCP
View on GitHub
Official repo for Learning to Reason for Long-Form Story Generation
☆78Apr 19, 2025Updated last year
haizelabs / j1-micro
View on GitHub
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆105Jul 19, 2025Updated last year
PrimeIntellect-ai / genesys
View on GitHub
☆139Mar 20, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,096Jun 2, 2025Updated last year
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,469Apr 17, 2026Updated 3 months ago
PrimeIntellect-ai / lab-cookbook
View on GitHub
Lab Cookbook
☆38Updated this week
Danau5tin / calculator_agent_rl
View on GitHub
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆75May 5, 2025Updated last year
andrew-silva / mlx-rlhf
View on GitHub
An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.
☆37Jun 21, 2024Updated 2 years ago
tokenbender / avataRL
View on GitHub
rl from zero pretrain, can it be done? yes.
☆295Sep 28, 2025Updated 10 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
View on GitHub
☆47Aug 29, 2024Updated last year
brendanhogan / 2025_advent_of_small_ml
View on GitHub
☆22Dec 24, 2025Updated 7 months ago
janhq / verifiers-deepresearch
View on GitHub
Verifiers for LLM Reinforcement Learning
☆83Sep 11, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
VatsaDev / NanoPoor
View on GitHub
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Apr 22, 2025Updated last year
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,744Updated this week
McGill-NLP / nano-aha-moment
View on GitHub
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
☆625Oct 7, 2025Updated 9 months ago
collinear-ai / spider
View on GitHub
Streamline on-policy/off-policy distillation workflows in a few lines of code
☆109Jul 22, 2026Updated last week
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,269Aug 27, 2025Updated 11 months ago
s-smits / grpo-optuna
View on GitHub
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆60Oct 18, 2025Updated 9 months ago
imoneoi / bf16_fused_adam
View on GitHub
BFloat16 Fused Adam Operator for PyTorch
☆20Nov 16, 2024Updated last year
WolframRavenwolf / MMLU-Pro
View on GitHub
MMLU-Pro eval results
☆15Aug 21, 2025Updated 11 months ago
dnakov / convx
View on GitHub
TUI conversation explorer for Claude Code & OpenCode
☆20Aug 21, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
JoeLi12345 / nGPT
View on GitHub
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆112Mar 7, 2025Updated last year
convergence-ai / lm2
View on GitHub
Official repo of paper LM2
☆49Feb 13, 2025Updated last year
minosvasilias / simple_grpo
View on GitHub
Simple GRPO scripts and configurations.
☆59Feb 6, 2025Updated last year
TextArena / UnstableBaselines
View on GitHub
☆120Apr 7, 2026Updated 3 months ago
ChenmienTan / RL2
View on GitHub
☆1,298May 20, 2026Updated 2 months ago
abacaj / train-with-fsdp
View on GitHub
☆93Oct 5, 2023Updated 2 years ago
dnakov / computer-use
View on GitHub
macOS computer use CLI — screenshots, input simulation, app management, session orchestration
☆24Mar 24, 2026Updated 4 months ago