Tim-Siu / reinforcement-distillationLinks
Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"
☆30Updated 4 months ago
Alternatives and similar repositories for reinforcement-distillation
Users that are interested in reinforcement-distillation are comparing it to the libraries listed below
Sorting:
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆43Updated 3 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆90Updated 3 weeks ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆42Updated 2 months ago
- ☆30Updated last week
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆27Updated 9 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆87Updated 9 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆69Updated 4 months ago
- Official Repository of "Learning what reinforcement learning can't"☆69Updated last week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆82Updated 5 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆20Updated 2 weeks ago
- Official Repository of LatentSeek☆69Updated 5 months ago
- ☆45Updated 2 months ago
- ☆46Updated 7 months ago
- ☆22Updated 6 months ago
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆15Updated last week
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆24Updated 6 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆48Updated 5 months ago
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆52Updated 3 months ago
- ☆38Updated 3 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆98Updated 2 months ago
- [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆40Updated 6 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆82Updated 5 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆25Updated 3 months ago
- ☆32Updated 6 months ago
- ☆21Updated 6 months ago
- One-shot Entropy Minimization☆187Updated 5 months ago
- ☆184Updated 6 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆49Updated last month
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆59Updated 6 months ago
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆111Updated last week