HJSang/OPSD_OnPolicyDistillation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HJSang/OPSD_OnPolicyDistillation)

HJSang / OPSD_OnPolicyDistillation

On Policy Distillation Build on top of Verl

☆92

Alternatives and similar repositories for OPSD_OnPolicyDistillation

Users that are interested in OPSD_OnPolicyDistillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RUCBM / G-OPD
View on GitHub
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
☆270May 28, 2026Updated last month
siyan-zhao / OPSD
View on GitHub
☆489May 10, 2026Updated 2 months ago
HJSang / CRISP_Reasoning_Compression
View on GitHub
☆62Jul 3, 2026Updated 2 weeks ago
beanie00 / self-distillation-analysis
View on GitHub
Codebase for the work “Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?”
☆74Apr 14, 2026Updated 3 months ago
thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆759Jun 23, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
thunlp / OPD
View on GitHub
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
☆830Jun 29, 2026Updated 3 weeks ago
zwhong714 / Hybrid-Policy-Distillation
View on GitHub
[ICML 2026] Hybrid Policy Distillation (HPD) is a practical distillation framework for reasoning-oriented language models. This repositor…
☆24Apr 24, 2026Updated 2 months ago
hhh675597 / revisiting_opd
View on GitHub
[COLM 2026] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
☆124May 19, 2026Updated 2 months ago
YoungZ365 / SOD
View on GitHub
PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"
☆149May 22, 2026Updated last month
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,017Jul 1, 2026Updated 2 weeks ago
machine981 / SCOPE
View on GitHub
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
☆28Jun 22, 2026Updated 3 weeks ago
kokolerk / TCOD
View on GitHub
[COLM 2026]TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
☆81Jul 9, 2026Updated last week
LuckyyySTA / GOLF
View on GitHub
☆18Mar 16, 2026Updated 4 months ago
WenjinHou / Uni-OPD
View on GitHub
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
☆50Jun 10, 2026Updated last month
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ltpo2025 / LTPO
View on GitHub
[ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
☆32Mar 6, 2026Updated 4 months ago
nick7nlp / Awesome-LLM-On-Policy-Distillation
View on GitHub
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
☆456Updated this week
louieworth / trd
View on GitHub
Official Implementation of Trajectory-Refined Distillation
☆26Jun 9, 2026Updated last month
maifoundations / Visionary-R1
View on GitHub
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆44Jul 2, 2025Updated last year
chrisliu298 / awesome-on-policy-distillation
View on GitHub
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
☆542Updated this week
Koreyoshi01 / VISD
View on GitHub
This repository is the official implementation for VISD.
☆21May 17, 2026Updated 2 months ago
idanshen / Self-Distillation
View on GitHub
☆657Apr 7, 2026Updated 3 months ago
Zengwh02 / GlimpRouter
View on GitHub
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
☆16Apr 24, 2026Updated 2 months ago
ZJU-REAL / Perceive-to-Reason
View on GitHub
Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning
☆29Jul 8, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Gen-Verse / Open-AgentRL
View on GitHub
RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios
☆586Jun 12, 2026Updated last month
ZJU-REAL / SDAR
View on GitHub
Official code for "Self-Distilled Agentic Reinforcement Learning"
☆310Updated this week
Nebularaid2000 / rethink_sft_generalization
View on GitHub
Repo for paper "Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability"
☆108Apr 23, 2026Updated 2 months ago
princeton-pli / AggAgent
View on GitHub
☆28Apr 29, 2026Updated 2 months ago
Peregrine123 / ROPD_official
View on GitHub
☆72May 8, 2026Updated 2 months ago
jwkirchenbauer / mtp-lm
View on GitHub
Source code to accompany research paper on training multi token prediction language models using self-distillation.
☆39Feb 21, 2026Updated 5 months ago
ZJU-REAL / SkillZero
View on GitHub
Official code for "SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization"
☆350May 20, 2026Updated 2 months ago
xuwenxinedu / R3
View on GitHub
☆30Apr 7, 2026Updated 3 months ago
mbzuai-oryx / Video-CoM
View on GitHub
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
☆22Jun 17, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
songmzhang / KDFlow
View on GitHub
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimoda…
☆220Updated this week
xzxxntxdy / PEPO
View on GitHub
Official repo for ”Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought“
☆26Mar 29, 2026Updated 3 months ago
Hesse73 / RLVR-Directions
View on GitHub
Source Code for our ICLR'26 paper
☆17Feb 22, 2026Updated 4 months ago
UCSB-AI / DMLR
View on GitHub
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
☆84May 12, 2026Updated 2 months ago
Zhaoyi-Li21 / creme
View on GitHub
[ACL 2024] "Understanding and Patching Compositional Reasoning in LLMs"
☆14Aug 28, 2024Updated last year
FloyedShen / AntiSD
View on GitHub
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
☆32May 14, 2026Updated 2 months ago
inclusionAI / Zooming-without-Zooming
View on GitHub
[ICML 2026] ZwZ model family: SOTA fine-grained perception performace; ZoomBench: a new challenging perception benchmark
☆174May 4, 2026Updated 2 months ago