lzhxmu / CPPOLinks
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)
☆167Updated 2 weeks ago
Alternatives and similar repositories for CPPO
Users that are interested in CPPO are comparing it to the libraries listed below
Sorting:
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆160Updated last month
- Extrapolating RLVR to General Domains without Verifiers☆179Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆257Updated 6 months ago
- ☆165Updated last month
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…