lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
☆117Updated last week
Alternatives and similar repositories for CPPO:
Users that are interested in CPPO are comparing it to the libraries listed below
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆115Updated 2 weeks ago
- ☆187Updated 2 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆155Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆191Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆152Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆173Updated last month
- ☆153Updated 3 weeks ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆75Updated last week
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆133Updated 4 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆114Updated 2 weeks ago
- ☆283Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆64Updated last week
- A RLHF Infrastructure for Vision-Language Models☆171Updated 5 months ago
- ☆125Updated 3 weeks ago
- ☆73Updated 3 months ago
- ☆86Updated 2 weeks ago
- Paper List of Inference/Test Time Scaling/Computing☆195Updated this week
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆133Updated last month
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆88Updated last month
- A jounery to real multimodel R1 ! We are doing on large-scale experiment☆295Updated last month
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆61Updated this week
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆184Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆65Updated 2 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆92Updated this week
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆152Updated last week
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆92Updated 5 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆174Updated last week
- The Next Step Forward in Multimodal LLM Alignment☆145Updated last month