Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
☆400Feb 17, 2026Updated 3 weeks ago
Alternatives and similar repositories for GDPO
Users that are interested in GDPO are comparing it to the libraries listed below
Sorting:
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 7 months ago
- Rethinking the Trust Region in LLM Reinforcement Learning☆43Mar 2, 2026Updated last week
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆32Feb 26, 2026Updated last week
- IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement☆18Mar 7, 2025Updated last year
- [ICLR 2026] Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing☆29Feb 6, 2026Updated last month
- Defeating the Training-Inference Mismatch via FP16☆183Nov 14, 2025Updated 3 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated 2 weeks ago
- ☆31Sep 12, 2025Updated 5 months ago
- Test-time Scaling for VAR models☆30Sep 19, 2025Updated 5 months ago
- ☆30Jan 18, 2026Updated last month
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated 2 months ago
- ☆25Aug 19, 2025Updated 6 months ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆76Sep 18, 2025Updated 5 months ago
- The source code of the paper "RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos"☆79Jan 5, 2026Updated 2 months ago
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆241Feb 10, 2026Updated last month
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization☆21Jan 27, 2026Updated last month
- Implementation of the ACL Findings paper "OutFlip: Generating Examples for Unknown Intent Detection with Natural Language Attack"☆10May 24, 2021Updated 4 years ago
- 一个开源数学大模型项目,旨在探索大模型是否具有数学创造能力,以及大模型在前沿数学研究中的潜在能力。☆17May 16, 2025Updated 9 months ago
- ☆15Nov 18, 2025Updated 3 months ago
- ☆15Jan 12, 2026Updated last month
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents☆23Feb 21, 2026Updated 2 weeks ago
- Code and resources for the NeurIPS 2025 Paper "BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset" by Zhiheng X…☆19Oct 14, 2025Updated 4 months ago
- [ICLR 2025] MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow☆26Apr 9, 2025Updated 11 months ago
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆48Oct 16, 2025Updated 4 months ago
- ☆64Jan 12, 2026Updated last month
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆61Feb 6, 2026Updated last month
- Official implementation of CharacterShot: Controllable and Consistent 4D Character Animation☆49Feb 27, 2026Updated last week
- [ICLR26] GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆104Jan 27, 2026Updated last month
- ☆57Aug 16, 2025Updated 6 months ago
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆12Jan 26, 2025Updated last year
- Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation☆28Dec 10, 2025Updated 3 months ago
- ☆55Jan 15, 2026Updated last month
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆16Feb 9, 2026Updated last month
- [SIGIR 2025] This is the code repo for our SIGIR'25 paper: Enhancing the Patent Matching Capability of Large Language Models via Memory G…☆19Apr 22, 2025Updated 10 months ago
- CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos☆13Jun 14, 2024Updated last year
- ☆14Apr 25, 2025Updated 10 months ago
- [NeurIPS 2023] Implementation of "Template-free Articulated Neural Point Clouds for Reposable View"☆35Aug 26, 2024Updated last year
- This repo contains the python code as well as the webpage html files for the Spice-E project from VAILab at TAU.☆27Dec 9, 2024Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆92Feb 14, 2025Updated last year