NVlabs / GDPOLinks
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
☆107Updated this week
Alternatives and similar repositories for GDPO
Users that are interested in GDPO are comparing it to the libraries listed below
Sorting:
- Geometric-Mean Policy Optimization☆96Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆128Updated 5 months ago
- Visual Planning: Let's Think Only with Images☆292Updated 7 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆174Updated 7 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 11 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆88Updated 5 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆234Updated last week
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆143Updated 5 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆114Updated 5 months ago
- ☆105Updated 7 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆187Updated 3 weeks ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆96Updated last week
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 7 months ago
- Multimodal RewardBench☆58Updated 10 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆113Updated 2 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆192Updated 3 weeks ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆388Updated 4 months ago
- ☆304Updated 3 weeks ago
- ☆191Updated 3 weeks ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆150Updated last week
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆135Updated this week
- This is the offical repository of InfiniteVL☆68Updated 3 weeks ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆260Updated 2 months ago
- [NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…☆152Updated 4 months ago
- ☆96Updated 6 months ago
- LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.☆218Updated 3 weeks ago
- 🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal rei…☆191Updated last month
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆73Updated last month
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆226Updated 5 months ago
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆384Updated 3 weeks ago