[ICLR 2026] Geometric-Mean Policy Optimization
☆100Jan 26, 2026Updated last month
Alternatives and similar repositories for GMPO
Users that are interested in GMPO are comparing it to the libraries listed below
Sorting:
- [ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆174Jan 29, 2026Updated last month
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆76Jul 18, 2025Updated 7 months ago
- Official Implementation of HIMA (COLM'25)☆19Nov 25, 2025Updated 3 months ago
- ☆33Nov 18, 2025Updated 3 months ago
- ☆14Jan 24, 2025Updated last year
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- ☆60Jan 12, 2026Updated last month
- ☆28May 24, 2025Updated 9 months ago
- ☆13May 12, 2025Updated 9 months ago
- ☆11May 18, 2025Updated 9 months ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Sep 28, 2025Updated 5 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- CMU Linguistic Annotation Backend☆15Sep 22, 2025Updated 5 months ago
- ☆13Sep 12, 2024Updated last year
- Rethinking the Trust Region in LLM Reinforcement Learning☆39Feb 25, 2026Updated last week
- ☆11Aug 26, 2021Updated 4 years ago
- ☆52Mar 17, 2025Updated 11 months ago
- The original Shared Recurrent Memory Transformer implementation☆33Jul 11, 2025Updated 7 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆89Jun 16, 2025Updated 8 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆51Oct 18, 2024Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- [ICLR 2026] Quantile Advantage Estimation for Entropy-Safe Reasoning☆23Oct 14, 2025Updated 4 months ago
- Audio Masking Methods☆12Nov 15, 2019Updated 6 years ago
- ☆15Apr 11, 2024Updated last year
- ☆32Nov 18, 2025Updated 3 months ago
- Extrapolating RLVR to General Domains without Verifiers☆201Aug 12, 2025Updated 6 months ago
- [ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization☆57Nov 10, 2023Updated 2 years ago
- ☆46Sep 27, 2025Updated 5 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,219Aug 27, 2025Updated 6 months ago
- UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation☆23May 16, 2025Updated 9 months ago
- (NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"☆41Feb 11, 2026Updated 3 weeks ago
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?☆20Oct 20, 2025Updated 4 months ago
- Official implementation of "Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs"☆19May 23, 2025Updated 9 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- xKV: Cross-Layer SVD for KV-Cache Compression☆44Nov 30, 2025Updated 3 months ago
- Official Repo For the [AAAI'26 Oral] Paper “StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback”☆30Updated this week
- dataset for Detecting and Explaining Causes From Text For a Time Series Event, EMNLP'17☆15Aug 31, 2020Updated 5 years ago
- 🤓 A collection of AWESOME structured summaries of Large Language Models (LLMs)☆11Sep 7, 2023Updated 2 years ago
- Official implementation of paper: LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Serie…☆18Dec 19, 2025Updated 2 months ago