NineAbyss / S2RLinks

This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"

☆69

Alternatives and similar repositories for S2R

Users that are interested in S2R are comparing it to the libraries listed below

Sorting:

rookie-joe / AutoPSV
☆50Updated 11 months ago
RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆83Updated 4 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated 10 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆23Updated 10 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆63Updated 3 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆112Updated 2 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆23Updated 2 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆130Updated 7 months ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆135Updated 6 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆46Updated 4 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆164Updated 7 months ago
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆112Updated 6 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆37Updated last year
test-time-interaction / TTI
☆63Updated 4 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆85Updated 2 weeks ago
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆140Updated 3 months ago
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 10 months ago
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆88Updated 10 months ago
ruixin31 / Spurious_Rewards
☆333Updated 2 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆177Updated 3 months ago
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆113Updated last month
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆82Updated 7 months ago
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆121Updated 6 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆115Updated 10 months ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆81Updated 8 months ago
kanishkg / cognitive-behaviors
☆210Updated 6 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 3 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆64Updated 8 months ago
LightChen233 / reasoning-boundary
☆68Updated 4 months ago