SIBench / Awesome-Visual-Spatial-ReasoningLinks
This is a project about visual spatial reasoning.
☆89Updated 3 weeks ago
Alternatives and similar repositories for Awesome-Visual-Spatial-Reasoning
Users that are interested in Awesome-Visual-Spatial-Reasoning are comparing it to the libraries listed below
Sorting:
- [2025CVPR] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation☆50Updated 2 months ago
- Official code release for paper "Robo-Imagine: A Robotic Video Generation Model, For Autoregressive Long-Term Task Video Generation With …☆28Updated 6 months ago
- GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models☆483Updated 4 months ago
- EO: Open-source Unified Embodied Foundation Model Series☆287Updated 2 months ago
- Official implementation of "Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance"☆363Updated this week
- ☆58Updated 7 months ago
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆81Updated 2 months ago
- ☆23Updated 3 weeks ago
- Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"☆273Updated 3 weeks ago
- [NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving…☆581Updated 4 months ago
- Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-e…☆91Updated 2 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆107Updated 9 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆426Updated last week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆203Updated 8 months ago
- [ICLR 2025] The offical implementation of "PSEC: Skill Expansion and Composition in Parameter Space", a new framework designed to facilit…☆63Updated 11 months ago
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆25Updated 7 months ago
- [ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation☆227Updated 6 months ago
- [ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following☆116Updated last month
- A paper list for spatial reasoning☆631Updated last week
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆103Updated 6 months ago
- TorchHook: A PyTorch hooks manager, providing convenient interfaces to capture feature maps and debug models.☆13Updated 4 months ago
- 🚀 Daily AI Research Digest: Tracking breakthroughs in AI/NLP/CV/Robotics with dynamic updates and paper navigation.☆61Updated this week
- Official implementation for "Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts"☆22Updated 7 months ago
- The official repository of [CVPR2025] DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering☆24Updated 9 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆113Updated last month
- ☆22Updated 7 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆63Updated 6 months ago
- A python script for downloading huggingface datasets and models.☆20Updated 9 months ago
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆183Updated last week
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆108Updated last month