AIFrameResearch / SPOLinks

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

☆41

Alternatives and similar repositories for SPO

Users that are interested in SPO are comparing it to the libraries listed below

Sorting:

RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆85Updated 5 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆314Updated last month
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆97Updated 8 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆259Updated 5 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆46Updated 4 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆71Updated 6 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆86Updated 8 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated 11 months ago
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆118Updated 6 months ago
lichengliu03 / unary-feedback
☆38Updated 2 months ago
StarDewXXX / AdaR1
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆20Updated last month
xuyige / SoftCoT
ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…
☆57Updated 5 months ago
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆177Updated 2 months ago
GAIR-NLP / ToRL
☆307Updated 5 months ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆83Updated 8 months ago
UCSB-NLP-Chang / ThinkPrune
☆45Updated last month
RUC-NLPIR / Tool-Star
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
☆282Updated 2 weeks ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆136Updated 6 months ago
Raibows / CREAM
Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.
☆27Updated 8 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆360Updated last month
hkgc-1 / GHPO
☆51Updated 3 months ago
Blueyee / Efficient-CoT-LRMs
Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!
☆69Updated 7 months ago
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆116Updated last month
THU-KEG / AdaptThink
☆162Updated 3 weeks ago
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆203Updated 4 months ago
MasterVito / SvS
Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training
☆40Updated 2 months ago
TsinghuaC3I / Unify-Post-Training
Towards a Unified View of Large Language Model Post-Training
☆178Updated 2 months ago
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆366Updated 3 months ago
THUDM / TreeRL
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆79Updated 4 months ago
bigai-nlco / LatentSeek
Official Repository of LatentSeek
☆66Updated 5 months ago