Tim-Siu / reinforcement-distillationLinks
Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"
☆28Updated last month
Alternatives and similar repositories for reinforcement-distillation
Users that are interested in reinforcement-distillation are comparing it to the libraries listed below
Sorting:
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆36Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆85Updated 7 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆67Updated 2 months ago
- Official Repository of LatentSeek☆60Updated 3 months ago
- ☆46Updated 5 months ago
- A Self-Training Framework for Vision-Language Reasoning☆84Updated 7 months ago
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆49Updated 3 weeks ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆81Updated 3 months ago
- ☆21Updated 4 months ago
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆24Updated 3 months ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆32Updated last month
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆46Updated 3 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆88Updated last month
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆44Updated 3 months ago
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.☆171Updated 2 months ago
- ☆25Updated last week
- ☆43Updated 5 months ago
- Official Repository of "Learning what reinforcement learning can't"☆66Updated 2 weeks ago
- Segment Policy Optimization: Improved Credit Assignment in Reinforcement Learning for LLMs☆32Updated this week
- [arXiv2505] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆50Updated last month
- ☆24Updated 4 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆18Updated this week
- ☆50Updated 2 months ago
- ☆60Updated 3 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆81Updated 7 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆22Updated last month
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆23Updated 2 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆26Updated 7 months ago
- ☆19Updated 4 months ago
- [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆39Updated 4 months ago