lzhxmu / CPPOLinks

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

☆147

Alternatives and similar repositories for CPPO

Users that are interested in CPPO are comparing it to the libraries listed below

Sorting:

OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆134Updated last week
InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆146Updated 3 weeks ago
THU-KEG / AdaptThink
☆140Updated 2 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆237Updated 2 months ago
GAIR-NLP / LIMR
☆206Updated 5 months ago
maple-research-lab / SLOT
☆101Updated last month
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆138Updated 4 months ago
multimodal-art-projection / LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
☆176Updated this week
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆282Updated 3 weeks ago
yafuly / TPO
Test-time preferenece optimization (ICML 2025).
☆158Updated 3 months ago
MiniMax-AI / One-RL-to-See-Them-All
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
☆309Updated 2 months ago
InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆188Updated 4 months ago
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆175Updated last month
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆127Updated 3 months ago
dongguanting / Tool-Star
Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
☆225Updated last week
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆105Updated 2 months ago
ADaM-BJTU / OpenRFT
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
☆147Updated 7 months ago
ruixin31 / Spurious_Rewards
☆323Updated last week
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆82Updated 5 months ago
OpenRLHF / OpenRLHF-M
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
☆139Updated 4 months ago
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆180Updated last month
RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆81Updated 2 months ago
AMAP-ML / GPG
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
☆157Updated 2 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆271Updated 3 weeks ago
EvolvingLMMs-Lab / multimodal-search-r1
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…
☆284Updated this week
qiancheng0 / ToolRL
☆309Updated 2 months ago
eddycmu / demystify-long-cot
☆310Updated 2 months ago
GAIR-NLP / ToRL
☆263Updated 2 months ago
hemingkx / TokenSkip
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
☆171Updated last month
fscdc / Awesome-Efficient-Reasoning-Models
[arXiv 2025] Efficient Reasoning Models: A Survey
☆247Updated 2 weeks ago