EvolvingLMMs-Lab / MGPOLinks

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

☆52

Alternatives and similar repositories for MGPO

Users that are interested in MGPO are comparing it to the libraries listed below

Sorting:

OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆62Updated 4 months ago
zhijie-group / UniCMs
☆39Updated 6 months ago
markywg / transagent
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆24Updated last year
TencentARC / SEED-Bench-R1
☆95Updated 5 months ago
inclusionAI / M2-Reasoning
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆46Updated 4 months ago
TencentARC / GRPO-CARE
☆79Updated 5 months ago
si0wang / VisVM
☆46Updated 11 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆104Updated last month
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆80Updated 4 months ago
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆87Updated 4 months ago
EvolvingLMMs-Lab / VideoMMMU
☆62Updated 3 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆71Updated last month
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆26Updated 11 months ago
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆72Updated 2 months ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆62Updated 9 months ago
marinero4972 / Open-o3-Video
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆120Updated 3 weeks ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆177Updated 2 weeks ago
Tiezheng11 / Vision-Language-Vision
☆63Updated 4 months ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆40Updated 9 months ago
OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆49Updated 5 months ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆72Updated last year
AV-Odyssey / AV-Odyssey
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆31Updated 11 months ago
inst-it / inst-it
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆38Updated 9 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 10 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆94Updated 9 months ago
xjtupanda / Sparrow
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Updated 3 months ago
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Updated 9 months ago
MajorDavidZhang / Generalization_unified_VLM
☆21Updated 6 months ago
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆164Updated last week
zhouyiks / CoLVA
☆40Updated 4 months ago