Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
☆47Sep 19, 2025Updated 7 months ago
Alternatives and similar repositories for SPO
Users that are interested in SPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]☆64Apr 11, 2026Updated 3 weeks ago
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 9 months ago
- ☆56Jul 7, 2025Updated 9 months ago
- ☆28Jul 18, 2025Updated 9 months ago
- Short RL☆18Apr 16, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ACL2026 Findings] "Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models"☆20Mar 25, 2025Updated last year
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆32Jul 6, 2025Updated 9 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 6 months ago
- ☆18Apr 10, 2025Updated last year
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆143Dec 17, 2025Updated 4 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆90Jun 16, 2025Updated 10 months ago
- ☆78Jun 28, 2025Updated 10 months ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆154Apr 7, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆48Jul 22, 2025Updated 9 months ago
- This is the official repository of the paper Exploring Superior Function Calls via Reinforcement Learning.☆34Aug 11, 2025Updated 8 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- An reconstruction of RL Introduction and its course materials for a more efficient entry☆22Mar 4, 2026Updated last month
- [ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling☆37Feb 25, 2026Updated 2 months ago
- An asynchronous streaming data management module for efficient post-training.☆63Updated this week
- ☆73Jun 10, 2025Updated 10 months ago
- [TPAMI 2026] Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆65Nov 18, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆10Mar 1, 2024Updated 2 years ago
- Accelerating RL for LLM Reasoning with Optimal Advantage Regression☆40May 30, 2025Updated 11 months ago
- [ACL'25] Code for "Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering"☆21Jul 23, 2025Updated 9 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆77Sep 8, 2025Updated 7 months ago
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)☆59Oct 10, 2025Updated 6 months ago
- ☆18Jul 24, 2025Updated 9 months ago
- ☆345May 24, 2025Updated 11 months ago
- ☆47Apr 9, 2025Updated last year
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆188Jul 23, 2025Updated 9 months ago
- Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)☆14Jan 8, 2023Updated 3 years ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆435Jul 11, 2025Updated 9 months ago
- Container-free RL framework for training software engineering agents☆54Mar 4, 2026Updated last month
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆90Jun 10, 2025Updated 10 months ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆38Oct 1, 2025Updated 7 months ago