vaew / Awesome-spatial-visual-reasoning-MLLMsLinks

Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)

☆71

Alternatives and similar repositories for Awesome-spatial-visual-reasoning-MLLMs

Users that are interested in Awesome-spatial-visual-reasoning-MLLMs are comparing it to the libraries listed below

Sorting:

ls-kelvin / REVPT
Code for paper: Reinforced Vision Perception with Tools
☆61Updated last month
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 7 months ago
MME-Benchmarks / MME-CoT
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
☆133Updated 3 months ago
MikeWangWZHL / PAPO
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆95Updated 2 months ago
xtong-zhang / Chain-of-Focus
☆49Updated 4 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 9 months ago
mm-vl / ULM-R1
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
☆30Updated 3 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆75Updated last year
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆70Updated last week
zhangquanchen / VisRL
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
☆40Updated last week
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆74Updated 4 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
xinyan-cxy / MINT-CoT
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆87Updated last month
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆248Updated last week
si0wang / VisVM
☆45Updated 10 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆125Updated 3 months ago
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆95Updated last month
JoeLeelyf / OVO-Bench
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆101Updated 3 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆99Updated 10 months ago
PKU-YuanGroup / Look-Back
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆69Updated 4 months ago
kesenzhao / UV-CoT
☆37Updated 3 months ago
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆97Updated last year
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated 3 weeks ago
yaolinli / TimeChat-Online
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆93Updated 2 months ago
yu-rp / VisualPerceptionToken
☆125Updated 7 months ago
minglllli / CLS-RL
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆71Updated last month
GuangyanS / Sys2-LLaVA
☆28Updated 9 months ago
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆161Updated 3 weeks ago
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆48Updated last year
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆86Updated 9 months ago