Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
☆45Sep 19, 2025Updated 6 months ago
Alternatives and similar repositories for SPO
Users that are interested in SPO are comparing it to the libraries listed below
Sorting:
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- ☆56Jul 7, 2025Updated 8 months ago
- ☆27Jul 18, 2025Updated 8 months ago
- Short RL☆18May 26, 2025Updated 9 months ago
- Offical Code For "Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models"☆20Mar 25, 2025Updated 11 months ago
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆31Jul 6, 2025Updated 8 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 5 months ago
- ☆18Apr 10, 2025Updated 11 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆142Dec 17, 2025Updated 3 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆89Jun 16, 2025Updated 9 months ago
- ☆77Jun 28, 2025Updated 8 months ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆135Apr 12, 2025Updated 11 months ago
- R1V, trained with AI feedback, answers open-ended visual questions.☆14Apr 12, 2025Updated 11 months ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated 2 months ago
- This is the official repository of the paper Exploring Superior Function Calls via Reinforcement Learning.☆34Aug 11, 2025Updated 7 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- [ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs☆20Nov 16, 2025Updated 4 months ago
- ☆74Jun 10, 2025Updated 9 months ago
- Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI, derived from Ling.☆107Aug 5, 2025Updated 7 months ago
- [TPAMI 2026] Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆63Nov 18, 2025Updated 4 months ago
- Accelerating RL for LLM Reasoning with Optimal Advantage Regression☆39May 30, 2025Updated 9 months ago
- [ACL'25] Code for "Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering"☆21Jul 23, 2025Updated 7 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆73Sep 8, 2025Updated 6 months ago
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)☆58Oct 10, 2025Updated 5 months ago
- ☆18Jul 24, 2025Updated 7 months ago
- ☆337May 24, 2025Updated 9 months ago
- ☆47Apr 9, 2025Updated 11 months ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆186Jul 23, 2025Updated 7 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆37Oct 3, 2025Updated 5 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆424Jul 11, 2025Updated 8 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆88Jun 10, 2025Updated 9 months ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 5 months ago
- [NIPS 2021] Code release for "Pareto Domain Adaptation"☆11Dec 13, 2021Updated 4 years ago
- 此项目是我个人对MIT 6.5940 课程作业的答案,学习笔记和心得。☆15Mar 1, 2024Updated 2 years ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆29Updated this week