RUCBM/G-OPD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RUCBM/G-OPD)

RUCBM / G-OPD

Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"

☆271

Alternatives and similar repositories for G-OPD

Users that are interested in G-OPD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thunlp / OPD
View on GitHub
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
☆835Jun 29, 2026Updated 3 weeks ago
HJSang / OPSD_OnPolicyDistillation
View on GitHub
On Policy Distillation Build on top of Verl
☆92May 25, 2026Updated last month
thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆760Jun 23, 2026Updated 3 weeks ago
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,017Jul 1, 2026Updated 2 weeks ago
siyan-zhao / OPSD
View on GitHub
☆491May 10, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
hhh675597 / revisiting_opd
View on GitHub
[COLM 2026] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
☆124May 19, 2026Updated 2 months ago
songmzhang / KDFlow
View on GitHub
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimoda…
☆222Updated this week
Peregrine123 / ROPD_official
View on GitHub
☆73May 8, 2026Updated 2 months ago
zwhong714 / Hybrid-Policy-Distillation
View on GitHub
[ICML 2026] Hybrid Policy Distillation (HPD) is a practical distillation framework for reasoning-oriented language models. This repositor…
☆24Apr 24, 2026Updated 2 months ago
VisionOPD / Vision-OPD
View on GitHub
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned percept…
☆197Updated this week
chrisliu298 / awesome-on-policy-distillation
View on GitHub
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
☆547Updated this week
HJSang / CRISP_Reasoning_Compression
View on GitHub
☆62Jul 3, 2026Updated 2 weeks ago
idanshen / Self-Distillation
View on GitHub
☆658Apr 7, 2026Updated 3 months ago
chiefovoavicii / MAD-OPD
View on GitHub
Official code for "Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate" (arXiv:2605.01347).
☆31May 7, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
jet-ai-projects / Lightning-OPD
View on GitHub
☆67May 12, 2026Updated 2 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,140Jun 9, 2026Updated last month
nick7nlp / Awesome-LLM-On-Policy-Distillation
View on GitHub
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
☆461Updated this week
ZJU-REAL / SDAR
View on GitHub
Official code for "Self-Distilled Agentic Reinforcement Learning"
☆310Updated this week
YoungZ365 / SOD
View on GitHub
PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"
☆150May 22, 2026Updated 2 months ago
ShenzhiYang2000 / OPRD
View on GitHub
OPRD: On-Policy Representation Distillation
☆44Updated this week
Utaotao / ProFit
View on GitHub
☆35Jan 20, 2026Updated 6 months ago
machine981 / SCOPE
View on GitHub
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
☆28Jun 22, 2026Updated 3 weeks ago
LuckyyySTA / GOLF
View on GitHub
☆18Mar 16, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Qwen-Applications / GD2PO
View on GitHub
☆20Jun 16, 2026Updated last month
iie-ycx / RLSD
View on GitHub
Code of Self-Distilled RLVR - RLSD
☆58May 19, 2026Updated 2 months ago
caiyuchen-ustc / EffOPD
View on GitHub
Repository for EffOPD. We are working on polishing the details.
☆71May 16, 2026Updated 2 months ago
inclusionAI / Zooming-without-Zooming
View on GitHub
[ICML 2026] ZwZ model family: SOTA fine-grained perception performace; ZoomBench: a new challenging perception benchmark
☆174May 4, 2026Updated 2 months ago
lili-chen / rltf
View on GitHub
Reinforcement Learning from Text Feedback
☆49Feb 17, 2026Updated 5 months ago
CostaliyA / Flow-OPD
View on GitHub
Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"
☆265Jun 24, 2026Updated 3 weeks ago
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆459Mar 20, 2026Updated 4 months ago
wzb-bupt / VGPO
View on GitHub
[ACL 2026] VGPO: Visually-Guided Policy Optimization for Multimodal Reasoning
☆31Apr 14, 2026Updated 3 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,587Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhuchichi56 / ASFT
View on GitHub
[ICLR 2026] The official implementation of the paper “Anchored Supervised Fine-Tuning”
☆47Jun 19, 2026Updated last month
XIAO4579 / PRISM
View on GitHub
Beyond SFT-to-RL: Pre-alignment via Black-BoxOn-Policy Distillation for Multimodal RL
☆96May 6, 2026Updated 2 months ago
kokolerk / TCOD
View on GitHub
[COLM 2026]TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
☆83Jul 9, 2026Updated last week
Shenzhi-Wang / Beyond-the-80-20-Rule-RLVR
View on GitHub
The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…
☆60Jan 5, 2026Updated 6 months ago
songmzhang / DSKDv2
View on GitHub
The official implementation of the paper "A Dual-Space Framework for General Knowledge Distillation of Large Language Models".
☆18Jan 4, 2026Updated 6 months ago
pUmpKin-Co / ComplementaryRL
View on GitHub
Co-evolving policy actors and experience extractors for efficient experience-driven agent RL
☆51May 12, 2026Updated 2 months ago
Xuekai-Zhu / FlowRL
View on GitHub
☆180Nov 24, 2025Updated 7 months ago