thunlp/OPD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thunlp/OPD)

thunlp / OPD

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

☆855

Alternatives and similar repositories for OPD

Users that are interested in OPD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RUCBM / G-OPD
View on GitHub
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
☆276May 28, 2026Updated 2 months ago
thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆770Updated this week
siyan-zhao / OPSD
View on GitHub
☆515May 10, 2026Updated 2 months ago
chrisliu298 / awesome-on-policy-distillation
View on GitHub
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
☆572Updated this week
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,027Jul 1, 2026Updated 3 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
nick7nlp / Awesome-LLM-On-Policy-Distillation
View on GitHub
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
☆473Updated this week
hhh675597 / revisiting_opd
View on GitHub
[COLM 2026] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
☆126May 19, 2026Updated 2 months ago
HJSang / OPSD_OnPolicyDistillation
View on GitHub
On Policy Distillation Build on top of Verl
☆92May 25, 2026Updated 2 months ago
CostaliyA / Flow-OPD
View on GitHub
Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"
☆265Jun 24, 2026Updated last month
idanshen / Self-Distillation
View on GitHub
☆664Apr 7, 2026Updated 3 months ago
songmzhang / KDFlow
View on GitHub
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimoda…
☆228Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,699Updated this week
VisionOPD / Vision-OPD
View on GitHub
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned percept…
☆214Jul 17, 2026Updated last week
louieworth / trd
View on GitHub
Official Implementation of Trajectory-Refined Distillation
☆30Jun 9, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,679Updated this week
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,158Jun 9, 2026Updated last month
HJSang / CRISP_Reasoning_Compression
View on GitHub
☆62Jul 3, 2026Updated 3 weeks ago
WenjinHou / Uni-OPD
View on GitHub
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
☆52Jun 10, 2026Updated last month
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,083Updated this week
ZJU-REAL / SDAR
View on GitHub
Official code for "Self-Distilled Agentic Reinforcement Learning"
☆313Updated this week
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,170Nov 13, 2025Updated 8 months ago
Peregrine123 / ROPD_official
View on GitHub
☆76May 8, 2026Updated 2 months ago
jet-ai-projects / Lightning-OPD
View on GitHub
☆68May 12, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,855Jul 14, 2026Updated 2 weeks ago
HHHHHejia / Awesome-AgenticLLM-RL-Papers
View on GitHub
☆1,849Jun 18, 2026Updated last month
RUC-NLPIR / ARPO
View on GitHub
[ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)
☆1,093Jul 13, 2026Updated 2 weeks ago
BytedTsinghua-SIA / Direct-OPD
View on GitHub
Weak-to-Strong Generalization via Direct On-Policy Distillation
☆65Jul 14, 2026Updated 2 weeks ago
thunlp / JustRL
View on GitHub
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
☆292Jun 29, 2026Updated last month
ShenzhiYang2000 / OPRD
View on GitHub
OPRD: On-Policy Representation Distillation (https://arxiv.org/abs/2606.06021)
☆51Jul 16, 2026Updated last week
PRIME-RL / TTRL
View on GitHub
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
☆1,103Apr 15, 2026Updated 3 months ago
Gen-Verse / Open-AgentRL
View on GitHub
RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios
☆595Jun 12, 2026Updated last month
kokolerk / TCOD
View on GitHub
[COLM 2026]TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
☆88Jul 9, 2026Updated 2 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆446Jul 11, 2025Updated last year
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,470Nov 9, 2025Updated 8 months ago
beanie00 / self-distillation-analysis
View on GitHub
Codebase for the work “Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?”
☆75Apr 14, 2026Updated 3 months ago
Gen-Verse / OpenClaw-RL
View on GitHub
OpenClaw-RL: Train any agent simply by talking
☆5,611May 23, 2026Updated 2 months ago
thinkwee / AgentsMeetRL
View on GitHub
Awesome List for Agentic RL
☆1,730Updated this week
TIGER-AI-Lab / verl-tool
View on GitHub
A version of verl to support diverse tool use [TMLR 2026]
☆1,026Jul 15, 2026Updated 2 weeks ago