Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
☆45Sep 19, 2025Updated 5 months ago
Alternatives and similar repositories for SPO
Users that are interested in SPO are comparing it to the libraries listed below
Sorting:
- ☆55Jul 7, 2025Updated 7 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 7 months ago
- ☆18Apr 10, 2025Updated 10 months ago
- ☆19Dec 20, 2025Updated 2 months ago
- Offical Code For "Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models"☆19Mar 25, 2025Updated 11 months ago
- ☆27Jul 18, 2025Updated 7 months ago
- Artifact evaluation of MobiSys25 SynCheck☆19Mar 24, 2025Updated 11 months ago
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆30Jul 6, 2025Updated 7 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆71Sep 8, 2025Updated 5 months ago
- ☆74Jun 28, 2025Updated 8 months ago
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆48Sep 15, 2025Updated 5 months ago
- ☆33Jul 15, 2025Updated 7 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆34Sep 25, 2025Updated 5 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆88Jun 10, 2025Updated 8 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆89Jun 16, 2025Updated 8 months ago
- [ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching i…☆45Jul 10, 2025Updated 7 months ago
- Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML☆59Dec 12, 2025Updated 2 months ago
- ☆50Sep 18, 2025Updated 5 months ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Jul 16, 2025Updated 7 months ago
- Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning…☆46Aug 4, 2025Updated 6 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆142Dec 17, 2025Updated 2 months ago
- [AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615☆61Nov 8, 2025Updated 3 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 7 months ago
- ☆114Sep 13, 2025Updated 5 months ago
- ☆55Feb 2, 2026Updated last month
- Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI, derived from Ling.☆107Aug 5, 2025Updated 6 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆183Jul 23, 2025Updated 7 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 6 months ago
- Offical implementation of "Auto-Regressively Generating Multi-View Consistent Images". (ICCV 2025)☆84Jul 26, 2025Updated 7 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆134Jan 31, 2026Updated last month
- The official implementation of "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering"☆56Jun 21, 2025Updated 8 months ago
- ☆72Jun 10, 2025Updated 8 months ago
- ☆95Feb 4, 2026Updated 3 weeks ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆133Apr 12, 2025Updated 10 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆421Jul 11, 2025Updated 7 months ago
- ☆164Jan 21, 2025Updated last year