TianHongZXY / RLVR-DecomposedLinks

[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"

☆142

Alternatives and similar repositories for RLVR-Decomposed

Users that are interested in RLVR-Decomposed are comparing it to the libraries listed below

Sorting:

sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆134Updated 9 months ago
rookie-joe / AutoPSV
☆51Updated last year
kanishkg / cognitive-behaviors
☆218Updated 9 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆76Updated 2 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆73Updated 5 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆168Updated 9 months ago
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆181Updated 7 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆83Updated 11 months ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆86Updated 10 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆64Updated last year
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆154Updated 6 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆67Updated last year
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆119Updated last year
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆126Updated 8 months ago
PKU-Alignment / aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
☆191Updated 11 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated last year
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆92Updated last month
GAIR-NLP / LIMR
☆213Updated 10 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆72Updated 8 months ago
multimodal-art-projection / KORGym
☆52Updated 7 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆26Updated last year
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆86Updated 9 months ago
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆94Updated 2 months ago
ruixin31 / Spurious_Rewards
☆346Updated 4 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆118Updated 7 months ago
Joshua-Ren / Learning_dynamics_LLM
☆197Updated this week
inclusionAI / PromptCoT
A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…
☆131Updated 2 months ago
FreedomIntelligence / OVM
☆70Updated last year
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆89Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆38Updated last year