SIBench / Awesome-Visual-Spatial-ReasoningLinks

This is a project about visual spatial reasoning.

☆82

Alternatives and similar repositories for Awesome-Visual-Spatial-Reasoning

Users that are interested in Awesome-Visual-Spatial-Reasoning are comparing it to the libraries listed below

Sorting:

EO-Robotics / EO1
EO: Open-source Unified Embodied Foundation Model Series
☆280Updated last month
Qi-Zhangyang / GPT4Scene-and-VLN-R1
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
☆472Updated 3 months ago
SanMumumu / FlowRAM
[2025CVPR] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
☆46Updated 2 months ago
Egbert-Lannister / Robo-Imagine
Official code release for paper "Robo-Imagine: A Robotic Video Generation Model, For Autoregressive Long-Term Task Video Generation With …
☆28Updated 5 months ago
InternLM / Spatial-SSRL
Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆102Updated 2 weeks ago
Qiukunpeng / Siamese-Diffusion
[CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
☆77Updated last month
aim-uofa / Omni-R1
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆105Updated last month
BIT-DA / ABS
[ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection
☆24Updated 6 months ago
jailflip / jailflip-2025
☆23Updated last month
solitaryTian / RLCFM
☆58Updated 6 months ago
fscdc / ReasonMap
[arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
☆71Updated 2 months ago
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆102Updated 6 months ago
SYuan03 / MM-IFEngine
[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following
☆116Updated last month
EvolvingLMMs-Lab / LongVT
Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆166Updated this week
MIV-XJTU / FSDrive
[NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving…
☆560Updated 3 months ago
HVision-NKU / GlimpsePrune
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
☆84Updated 4 months ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆198Updated 8 months ago
LMD0311 / HERMES
[ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
☆210Updated 5 months ago
cambrian-mllm / cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video
☆468Updated 2 weeks ago
Wakals / CoVT
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
☆249Updated this week
ZJU-REAL / SVGenius
[ACM MM 2025] SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. https://arxiv.org/abs/2506.03139
☆74Updated 2 months ago
mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM
A paper list for spatial reasoning
☆588Updated 2 weeks ago
AntResearchNLP / ViLaSR
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆87Updated 5 months ago
Tavish9 / awesome-daily-AI-arxiv
🚀 Daily AI Research Digest: Tracking breakthroughs in AI/NLP/CV/Robotics with dynamic updates and paper navigation.
☆58Updated this week
diankun-wu / Spatial-MLLM
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆419Updated last month
hhnqqq / py_hfd
A python script for downloading huggingface datasets and models.
☆20Updated 9 months ago
WayneJin0918 / SRUM
Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-e…
☆89Updated last month
ZJU-REAL / ViewSpatial-Bench
ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models
☆66Updated 7 months ago
Eason-Li-AIS / DrafterBench
A benchmark evaluates LLMs' performance in automating drawing revision tasks.
☆56Updated 2 weeks ago
Haochen-Wang409 / ross3d
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆63Updated 5 months ago