SIBench / Awesome-Visual-Spatial-ReasoningLinks
This is a project about visual spatial reasoning.
☆82Updated last month
Alternatives and similar repositories for Awesome-Visual-Spatial-Reasoning
Users that are interested in Awesome-Visual-Spatial-Reasoning are comparing it to the libraries listed below
Sorting:
- EO: Open-source Unified Embodied Foundation Model Series☆280Updated last month
- GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models☆472Updated 3 months ago
- [2025CVPR] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation☆46Updated 2 months ago
- Official code release for paper "Robo-Imagine: A Robotic Video Generation Model, For Autoregressive Long-Term Task Video Generation With …☆28Updated 5 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆102Updated 2 weeks ago
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆77Updated last month
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆105Updated last month
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆24Updated 6 months ago
- ☆23Updated last month
- ☆58Updated 6 months ago
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆71Updated 2 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆102Updated 6 months ago
- [ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following☆116Updated last month
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆166Updated this week
- [NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving…☆560Updated 3 months ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"☆84Updated 4 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆198Updated 8 months ago
- [ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation☆210Updated 5 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆468Updated 2 weeks ago
- Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"☆249Updated this week
- [ACM MM 2025] SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. https://arxiv.org/abs/2506.03139☆74Updated 2 months ago
- A paper list for spatial reasoning☆588Updated 2 weeks ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆87Updated 5 months ago
- 🚀 Daily AI Research Digest: Tracking breakthroughs in AI/NLP/CV/Robotics with dynamic updates and paper navigation.☆58Updated this week
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆419Updated last month
- A python script for downloading huggingface datasets and models.☆20Updated 9 months ago
- Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-e…☆89Updated last month
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated 7 months ago
- A benchmark evaluates LLMs' performance in automating drawing revision tasks.☆56Updated 2 weeks ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆63Updated 5 months ago