Tim-Siu / reinforcement-distillationLinks
☆23Updated 2 weeks ago
Alternatives and similar repositories for reinforcement-distillation
Users that are interested in reinforcement-distillation are comparing it to the libraries listed below
Sorting:
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆22Updated last month
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆19Updated this week
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆14Updated this week
- ☆46Updated 2 months ago
- SFT+RL boosts multimodal reasoning☆14Updated this week
- (ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning☆37Updated 2 weeks ago
- ☆22Updated last week
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated last week
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 4 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆69Updated 3 weeks ago
- Code release for VTW (AAAI 2025) Oral☆43Updated 5 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 3 weeks ago
- ☆16Updated 5 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆18Updated last week
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆19Updated 3 months ago
- ☆139Updated last month
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆62Updated 3 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆73Updated 4 months ago
- ☆18Updated last month
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆38Updated last month
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆31Updated this week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆73Updated last week
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆119Updated 3 weeks ago
- ☆19Updated 2 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 6 months ago
- Official Repository of "Learning what reinforcement learning can't"☆32Updated last week
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆64Updated last month
- ☆80Updated 5 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆32Updated last month