LeapLabTHU / limit-of-RLVRLinks

repo for paper https://arxiv.org/abs/2504.13837

☆271

Alternatives and similar repositories for limit-of-RLVR

Users that are interested in limit-of-RLVR are comparing it to the libraries listed below

Sorting:

ruixin31 / Spurious_Rewards
☆344Updated 4 months ago
multimodal-art-projection / LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
☆290Updated last month
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆330Updated 2 months ago
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆187Updated 5 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆257Updated 6 months ago
ypwang61 / One-Shot-RLVR
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆383Updated 2 weeks ago
TsinghuaC3I / Unify-Post-Training
Towards a Unified View of Large Language Model Post-Training
☆191Updated 2 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆380Updated 2 months ago
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆180Updated 3 months ago
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆390Updated 4 months ago
GAIR-NLP / ToRL
☆316Updated 6 months ago
Joshua-Ren / Learning_dynamics_LLM
☆185Updated 6 months ago
eric-ai-lab / Soft-Thinking
Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"
☆278Updated 3 weeks ago
GAIR-NLP / LIMR
☆213Updated 9 months ago
zwhe99 / DeepMath
A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
☆279Updated 2 months ago
InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆190Updated 8 months ago
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)
☆167Updated last month
mll-lab-nu / VAGEN
Training VLM agents with multi-turn reinforcement learning
☆338Updated this week
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆167Updated 8 months ago
kanishkg / cognitive-behaviors
☆216Updated 8 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆116Updated 4 months ago
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆98Updated 9 months ago
RyanLiu112 / GenPRM
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆90Updated 3 weeks ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆168Updated 6 months ago
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆253Updated 7 months ago
UCSC-VLAA / VLAA-Thinking
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆143Updated last month
fscdc / Awesome-Efficient-Reasoning-Models
[TMLR 2025] Efficient Reasoning Models: A Survey
☆282Updated last month
InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆160Updated 2 months ago
dvlab-research / ARPO
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
☆138Updated 6 months ago
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆204Updated last week