Lux0926 / ASPRMLinks

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

☆10

Alternatives and similar repositories for ASPRM

Users that are interested in ASPRM are comparing it to the libraries listed below

Sorting:

hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆59Updated 6 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 4 months ago
zzli2022 / TLDR
Code for Research Project TLDR
☆24Updated 4 months ago
RyanLiu112 / GenPRM
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆90Updated 3 weeks ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆84Updated 9 months ago
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆97Updated 9 months ago
Blueyee / Efficient-CoT-LRMs
Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!
☆70Updated 7 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated last year
GAIR-NLP / self-improvement-reversal
☆13Updated last year
ChnQ / MI-Peaks
☆55Updated 4 months ago
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆125Updated last month
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆68Updated 4 months ago
LightChen233 / reasoning-boundary
☆69Updated 5 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆132Updated 8 months ago
zhangxy-2019 / critique-GRPO
☆46Updated last month
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆87Updated 9 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆48Updated 5 months ago
sunnweiwei / FoldAgent
☆63Updated last month
AlphaLab-USTC / LRM-plans-CoT
[NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"
☆26Updated 4 months ago
iie-ycx / DEER
This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.
☆177Updated 4 months ago
WangHanLinHenry / SPA-RL-Agent
Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"
☆54Updated 2 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆25Updated 11 months ago
ShadeCloak / ADORA
☆46Updated 7 months ago
Zanette-Labs / efficient-reasoning
☆67Updated 7 months ago
tianyi-lab / MiP-Overthinking
[COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
☆35Updated 5 months ago
bigai-nlco / LatentSeek
Official Repository of LatentSeek
☆69Updated 5 months ago
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆25Updated 3 months ago
zhyang2226 / AR-Lopti
[AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆40Updated 6 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆72Updated 7 months ago
falonss703 / Awesome-Uncertainty-based-Reinforcement-Learning
🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL
☆52Updated 3 months ago