Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"
☆33Jul 25, 2025Updated 7 months ago
Alternatives and similar repositories for reinforcement-distillation
Users that are interested in reinforcement-distillation are comparing it to the libraries listed below
Sorting:
- ☆23Jan 9, 2026Updated last month
- Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?". (ACL 2025 Main)☆21Jun 18, 2025Updated 8 months ago
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆26May 29, 2025Updated 9 months ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆35Aug 12, 2025Updated 6 months ago
- A holistic framework for advancing LLMs as data science agents☆33Feb 3, 2026Updated last month
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Jun 28, 2025Updated 8 months ago
- ☆21Nov 27, 2025Updated 3 months ago
- Official code for DeepSound-V1☆13May 14, 2025Updated 9 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- [ICCV 2025] Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation.☆51Aug 27, 2025Updated 6 months ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆48Dec 25, 2025Updated 2 months ago
- DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning…☆28Sep 7, 2025Updated 5 months ago
- 【ICLR 2025 🔥】The code for Consistent In-Context Editing, an approach for tuning language models through contextual distributions, overco…☆48Apr 2, 2025Updated 11 months ago
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆25Jun 27, 2025Updated 8 months ago
- The OlymMATH dataset☆23Jun 1, 2025Updated 9 months ago
- ☆17Aug 1, 2025Updated 7 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- [ICLR 2025] The offical implementation of "PSEC: Skill Expansion and Composition in Parameter Space", a new framework designed to facilit…☆63Feb 12, 2025Updated last year
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆27May 26, 2025Updated 9 months ago
- [ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching i…☆45Jul 10, 2025Updated 7 months ago
- Reconstructing spatiotemporal dynamics from spatial transcriptome snapshots☆34Jun 26, 2025Updated 8 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- ☆25Apr 10, 2025Updated 10 months ago
- Process Reward Models That Think☆80Nov 29, 2025Updated 3 months ago
- [NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression☆50Nov 4, 2025Updated 4 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 8 months ago
- ☆67Dec 7, 2025Updated 2 months ago
- [AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data☆33Apr 7, 2025Updated 10 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆35Dec 6, 2025Updated 3 months ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆320Jan 3, 2026Updated 2 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆42Feb 13, 2025Updated last year
- ☆63Jul 11, 2025Updated 7 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 9 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Jan 22, 2025Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Sep 18, 2025Updated 5 months ago