ruixin31 / Rethink_RLVRLinks

☆231

Alternatives and similar repositories for Rethink_RLVR

Users that are interested in Rethink_RLVR are comparing it to the libraries listed below

Sorting:

TIGER-AI-Lab / verl-tool
A version of verl to support tool use
☆172Updated this week
GAIR-NLP / ToRL
☆198Updated last week
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆104Updated 2 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆149Updated 2 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆35Updated 3 weeks ago
RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆73Updated last month
kanishkg / cognitive-behaviors
☆173Updated 2 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆123Updated 2 months ago
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆139Updated last week
Dereck0602 / Awesome_Test_Time_LLMs
☆105Updated 2 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆205Updated this week
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆52Updated this week
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆54Updated 6 months ago
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆186Updated 2 months ago
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆85Updated last week
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆213Updated 3 weeks ago
thu-wyz / inference_scaling
☆69Updated 6 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆60Updated 5 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆120Updated 8 months ago
QingyangZhang / Label-Free-RLVR
☆57Updated this week
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆136Updated 8 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆106Updated 5 months ago
OpenSparseLLMs / MoM
☆83Updated last month
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆61Updated 5 months ago
ypwang61 / One-Shot-RLVR
official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”
☆257Updated this week
RAGEN-AI / VAGEN
☆151Updated this week
Joshua-Ren / Learning_dynamics_LLM
☆131Updated 2 weeks ago
ShadeCloak / ADORA
☆45Updated last month
GAIR-NLP / LIMR
☆201Updated 3 months ago
ssmisya / PRMBench
The official code repository for PRMBench.
☆73Updated 3 months ago