sail-sg/Stable-RL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sail-sg/Stable-RL)

sail-sg / Stable-RL

Rethinking the Trust Region in LLM Reinforcement Learning

☆62

Alternatives and similar repositories for Stable-RL

Users that are interested in Stable-RL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sail-sg / VeriFree
View on GitHub
Reinforcing General Reasoning without Verifiers
☆102Jun 24, 2025Updated last year
mit-han-lab / vcpo
View on GitHub
[ICML 2026] Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
☆29Apr 27, 2026Updated 2 months ago
OpenRewardAI / openreward-cookbook
View on GitHub
Training and evaluating with OpenReward
☆33Apr 28, 2026Updated 2 months ago
tajwarfahim / maxrl
View on GitHub
Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"
☆199May 28, 2026Updated last month
princeton-pli / retaining-by-doing
View on GitHub
☆44Dec 25, 2025Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
nuprl / MultiPL-T
View on GitHub
Knowledge transfer from high-resource to low-resource programming languages for Code LLMs
☆17Aug 12, 2025Updated 11 months ago
TianHongZXY / RLVR-Decomposed
View on GitHub
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆165Mar 2, 2026Updated 4 months ago
sail-sg / variational-reasoning
View on GitHub
Code for "Variational Reasoning for Language Models"
☆60Sep 29, 2025Updated 9 months ago
Infini-AI-Lab / M2PO
View on GitHub
☆32Oct 8, 2025Updated 9 months ago
LeapLabTHU / limit-of-RLVR
View on GitHub
repo for paper https://arxiv.org/abs/2504.13837
☆345Dec 17, 2025Updated 7 months ago
yihedeng9 / DuoGuard
View on GitHub
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
☆34Feb 26, 2025Updated last year
qwenpilot / FIPO
View on GitHub
This code implements the algorithm of FIPO, a value-free RL recipe for eliciting deeper reasoning from a clean base model.
☆130Apr 7, 2026Updated 3 months ago
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,017Jul 1, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ZNLP / Language-Imbalance-Driven-Rewarding
View on GitHub
[ICLR 2025] Language Imbalance Driven Rewarding for Multilingual Self-improving
☆25Apr 6, 2026Updated 3 months ago
sail-sg / oat
View on GitHub
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆667Jan 29, 2026Updated 5 months ago
kvfrans / matrix-whitening
View on GitHub
Code for "What really matters in matrix-whitening optimizers?"
☆25Oct 31, 2025Updated 8 months ago
bethgelab / delta-belief-rl
View on GitHub
Official implementation of the ΔBelief-RL method.
☆31Feb 28, 2026Updated 4 months ago
VITA-Group / Data-Efficient-Scaling
View on GitHub
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Jan 4, 2024Updated 2 years ago
maifoundations / Visionary-R1
View on GitHub
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆44Jul 2, 2025Updated last year
sail-sg / Precision-RL
View on GitHub
Defeating the Training-Inference Mismatch via FP16
☆197Nov 14, 2025Updated 8 months ago
YuanheZ / DAG-MATH
View on GitHub
[ICLR2026] DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs
☆23Oct 19, 2025Updated 9 months ago
evolvent-ai / ClawMark
View on GitHub
🦞 ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
☆117May 28, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Tencent-Hunyuan / UniRL
View on GitHub
UniRL is a Framework for Unified Multimodal Model Reinforcement Learning
☆843Updated this week
beanie00 / self-distillation-analysis
View on GitHub
Codebase for the work “Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?”
☆74Apr 14, 2026Updated 3 months ago
princeton-pli / AggAgent
View on GitHub
☆28Apr 29, 2026Updated 2 months ago
tang-bd / v-grpo
View on GitHub
[CVPR 2026 Findings] V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
☆56Apr 28, 2026Updated 2 months ago
sail-sg / odc
View on GitHub
On demand communication
☆34Apr 16, 2026Updated 3 months ago
ars22 / e3
View on GitHub
☆20Sep 16, 2025Updated 10 months ago
sail-sg / VocabularyParallelism
View on GitHub
Vocabulary Parallelism
☆26Mar 10, 2025Updated last year
Interplay-LM-Reasoning / Interplay-LM-Reasoning
View on GitHub
[ICML 2026 Spotlight] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
☆162Jun 8, 2026Updated last month
Re-Align / AlignTDS
View on GitHub
Analyzing LLM Alignment via Token distribution shift
☆17Jan 26, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
JeanKaddour / tpo
View on GitHub
Target Policy Optimization (JAX)
☆30Apr 18, 2026Updated 3 months ago
JinjieNi / dlms-are-super-data-learners
View on GitHub
The official github repo for "Diffusion Language Models are Super Data Learners".
☆227Nov 6, 2025Updated 8 months ago
allenai / numglue
View on GitHub
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
☆20May 10, 2022Updated 4 years ago
Sphere-AI-Lab / OrthoMerge
View on GitHub
Implementation of <Orthogonal Model Merging>
☆33May 27, 2026Updated last month
bbartoldson / TBA
View on GitHub
Official implementation of TBA for async LLM post-training.
☆31Nov 5, 2025Updated 8 months ago
Harahan / RTDMD
View on GitHub
[arXiv 2026] This is the official PyTorch implementation of "RTDMD: Reinforcing Few-step Generators via Reward-Tilted Distribution Matchi…
☆41Jun 6, 2026Updated last month
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year