lasgroup/SDPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lasgroup/SDPO)

lasgroup / SDPO

Reinforcement Learning via Self-Distillation (SDPO)

☆1,021

Alternatives and similar repositories for SDPO

Users that are interested in SDPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

idanshen / Self-Distillation
View on GitHub
☆662Apr 7, 2026Updated 3 months ago
siyan-zhao / OPSD
View on GitHub
☆504May 10, 2026Updated 2 months ago
RUCBM / G-OPD
View on GitHub
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
☆274May 28, 2026Updated last month
beanie00 / self-distillation-analysis
View on GitHub
Codebase for the work “Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?”
☆74Apr 14, 2026Updated 3 months ago
thunlp / OPD
View on GitHub
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
☆843Jun 29, 2026Updated 3 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
HJSang / CRISP_Reasoning_Compression
View on GitHub
☆62Jul 3, 2026Updated 3 weeks ago
chrisliu298 / awesome-on-policy-distillation
View on GitHub
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
☆562Updated this week
thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆763Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,649Updated this week
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,151Jun 9, 2026Updated last month
HJSang / OPSD_OnPolicyDistillation
View on GitHub
On Policy Distillation Build on top of Verl
☆92May 25, 2026Updated 2 months ago
lili-chen / rltf
View on GitHub
Reinforcement Learning from Text Feedback
☆49Feb 17, 2026Updated 5 months ago
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,621Updated this week
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
hhh675597 / revisiting_opd
View on GitHub
[COLM 2026] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
☆126May 19, 2026Updated 2 months ago
Gen-Verse / Open-AgentRL
View on GitHub
RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios
☆591Jun 12, 2026Updated last month
Peregrine123 / ROPD_official
View on GitHub
☆74May 8, 2026Updated 2 months ago
RUC-NLPIR / ARPO
View on GitHub
[ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)
☆1,092Jul 13, 2026Updated last week
aiming-lab / SkillRL
View on GitHub
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
☆904May 17, 2026Updated 2 months ago
ZJU-REAL / SkillZero
View on GitHub
Official code for "SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization"
☆355Updated this week
Gen-Verse / OpenClaw-RL
View on GitHub
OpenClaw-RL: Train any agent simply by talking
☆5,606May 23, 2026Updated 2 months ago
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,081Updated this week
ruixin31 / Spurious_Rewards
View on GitHub
☆361Jul 29, 2025Updated 11 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
BytedTsinghua-SIA / DAPO
View on GitHub
An Open-source RL System from ByteDance Seed and Tsinghua AIR
☆1,846May 11, 2025Updated last year
PRIME-RL / TTRL
View on GitHub
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
☆1,103Apr 15, 2026Updated 3 months ago
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆444Jul 11, 2025Updated last year
lasgroup / user_interactions
View on GitHub
Aligning Language Models from User Interactions via Self-Distillation
☆26Mar 31, 2026Updated 3 months ago
areal-project / AReaL
View on GitHub
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
☆5,596Updated this week
mll-lab-nu / RAGEN
View on GitHub
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
☆2,756Apr 14, 2026Updated 3 months ago
tajwarfahim / maxrl
View on GitHub
Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"
☆200May 28, 2026Updated last month
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,468Nov 9, 2025Updated 8 months ago
Zhiyuan-Zeng / RLVE
View on GitHub
[ICML 2026] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
☆225Apr 30, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jet-ai-projects / Lightning-OPD
View on GitHub
☆67May 12, 2026Updated 2 months ago
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,150Nov 13, 2025Updated 8 months ago
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,727Updated this week
NVlabs / GDPO
View on GitHub
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
☆491May 20, 2026Updated 2 months ago
thinking-machines-lab / tinker-cookbook
View on GitHub
Post-training with Tinker
☆3,906Updated this week
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,268Aug 27, 2025Updated 10 months ago
Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,096Jun 2, 2025Updated last year