ZJU-REAL / Mind-the-GapLinks
[NeurIPS 2025] Mind the Gap: Bridging Thought Leap for Improved CoT Tuning https://arxiv.org/abs/2505.14684
☆41Updated 3 weeks ago
Alternatives and similar repositories for Mind-the-Gap
Users that are interested in Mind-the-Gap are comparing it to the libraries listed below
Sorting:
- A Unified Framework for High-Performance and Extensible LLM Steering☆79Updated this week
- ☆29Updated 2 months ago
- ☆36Updated last week
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆265Updated last month
- [NeurIPS 2025] Code for Let LLMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604☆48Updated 2 weeks ago
- This repository is the official implementation of TimeHC-RL (Distilabel (Data Generation) + TRL (SFT) + VeRL (GRPO)).☆49Updated 4 months ago
- A Self-Training Framework for Vision-Language Reasoning☆86Updated 8 months ago
- GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts☆31Updated 2 weeks ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆190Updated 5 months ago
- ☆29Updated last month
- ☆26Updated 4 months ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆37Updated 3 weeks ago
- ☆38Updated 2 months ago
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆39Updated last month
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆26Updated 8 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆56Updated 10 months ago
- Extrapolating RLVR to General Domains without Verifiers☆173Updated 2 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆82Updated 4 months ago
- SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks☆96Updated last month
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆182Updated last month
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆130Updated 4 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆29Updated 2 months ago
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆50Updated last month
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆45Updated last month
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆81Updated 8 months ago
- ☆156Updated last week
- ☆21Updated 5 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆54Updated 4 months ago
- Official Repository of LatentSeek☆64Updated 4 months ago
- ☆68Updated 4 months ago