chrisliu298/awesome-on-policy-distillation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/chrisliu298/awesome-on-policy-distillation)

chrisliu298 / awesome-on-policy-distillation

A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models

☆547

Alternatives and similar repositories for awesome-on-policy-distillation

Users that are interested in awesome-on-policy-distillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆760Jun 23, 2026Updated 3 weeks ago
nick7nlp / Awesome-LLM-On-Policy-Distillation
View on GitHub
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
☆461Updated this week
thunlp / OPD
View on GitHub
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
☆835Jun 29, 2026Updated 3 weeks ago
siyan-zhao / OPSD
View on GitHub
☆491May 10, 2026Updated 2 months ago
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,017Jul 1, 2026Updated 2 weeks ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
RUCBM / G-OPD
View on GitHub
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
☆271May 28, 2026Updated last month
VisionOPD / Vision-OPD
View on GitHub
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned percept…
☆197Updated this week
songmzhang / KDFlow
View on GitHub
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimoda…
☆222Updated this week
idanshen / Self-Distillation
View on GitHub
☆658Apr 7, 2026Updated 3 months ago
HJSang / CRISP_Reasoning_Compression
View on GitHub
☆62Jul 3, 2026Updated 2 weeks ago
HJSang / OPSD_OnPolicyDistillation
View on GitHub
On Policy Distillation Build on top of Verl
☆92May 25, 2026Updated last month
FloyedShen / AntiSD
View on GitHub
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
☆32May 14, 2026Updated 2 months ago
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,569Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,587Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
machine981 / SCOPE
View on GitHub
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
☆28Jun 22, 2026Updated last month
ozyyshr / RAST
View on GitHub
Reasoning Activation in LLMs via Small Model Transfer (NeurIPS 2025)
☆22Oct 16, 2025Updated 9 months ago
ZJU-REAL / SDAR
View on GitHub
Official code for "Self-Distilled Agentic Reinforcement Learning"
☆310Updated this week
kokolerk / TCOD
View on GitHub
[COLM 2026]TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
☆83Jul 9, 2026Updated last week
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,074Updated this week
jet-ai-projects / Lightning-OPD
View on GitHub
☆67May 12, 2026Updated 2 months ago
XIAO4579 / PRISM
View on GitHub
Beyond SFT-to-RL: Pre-alignment via Black-BoxOn-Policy Distillation for Multimodal RL
☆96May 6, 2026Updated 2 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,140Jun 9, 2026Updated last month
lauyikfung / SDPG
View on GitHub
SDPG: Self-Distilled Policy Gradient
☆46Jun 15, 2026Updated last month
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
WenjinHou / Uni-OPD
View on GitHub
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
☆50Jun 10, 2026Updated last month
HHHHHejia / Awesome-AgenticLLM-RL-Papers
View on GitHub
☆1,841Jun 18, 2026Updated last month
beanie00 / self-distillation-analysis
View on GitHub
Codebase for the work “Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?”
☆74Apr 14, 2026Updated 3 months ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
MikeWangWZHL / PAPO
View on GitHub
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆151Feb 4, 2026Updated 5 months ago
rStar-RL / LoongRL
View on GitHub
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts (ICLR 2026 Oral)
☆35Feb 20, 2026Updated 5 months ago
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,130Nov 13, 2025Updated 8 months ago
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,467Nov 9, 2025Updated 8 months ago
thinkwee / AgentsMeetRL
View on GitHub
Awesome List for Agentic RL
☆1,705Jun 20, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ShenzhiYang2000 / OPRD
View on GitHub
OPRD: On-Policy Representation Distillation
☆44Updated this week
YoungZ365 / SOD
View on GitHub
PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"
☆150May 22, 2026Updated 2 months ago
sheriyuo / DART
View on GitHub
Reasoning and Tool-use Compete in Agentic RL: From Quantifying Interference to Disentangled Tuning
☆31May 7, 2026Updated 2 months ago
NVIDIA-NeMo / labs-molt
View on GitHub
☆546Updated this week
Jianglin954 / awesome-on-policy-distillation
View on GitHub
A curated list of resources on on-policy distillation
☆25Apr 13, 2026Updated 3 months ago
threegold116 / Awesome-Omni-MLLMs
View on GitHub
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆103Mar 22, 2026Updated 3 months ago
iie-ycx / RLSD
View on GitHub
Code of Self-Distilled RLVR - RLSD
☆58May 19, 2026Updated 2 months ago