mll-lab-nu / Awesome-Spatial-Intelligence-in-VLMLinks
A paper list for spatial reasoning
β614Updated last week
Alternatives and similar repositories for Awesome-Spatial-Intelligence-in-VLM
Users that are interested in Awesome-Spatial-Intelligence-in-VLM are comparing it to the libraries listed below
Sorting:
- π This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.β391Updated last week
- Official repo and evaluation implementation of VSI-Benchβ667Updated 5 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β201Updated 8 months ago
- Cambrian-S: Towards Spatial Supersensing in Videoβ482Updated last month
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β308Updated last year
- β118Updated 2 months ago
- β114Updated 6 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ424Updated last week
- Official implementation of "Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance"β229Updated last week
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ103Updated 6 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β233Updated 5 months ago
- Compose multimodal datasets πΉβ542Updated 3 weeks ago
- Thinking in 360Β°: Humanoid Visual Search in the Wildβ111Updated this week
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaborationβ113Updated last month
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β195Updated 7 months ago
- PyTorch implementation of NEPAβ296Updated last month
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligenceβ74Updated last week
- A paper list of Awesome Latent Space.β305Updated last week
- [NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"β222Updated last month
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"β48Updated last month
- A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiastsβ¦β1,769Updated last week
- A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Aβ¦β1,134Updated last week
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistantβ378Updated 10 months ago
- Visual Planning: Let's Think Only with Imagesβ294Updated 8 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ369Updated 3 months ago
- [NeurIPS 2025]βοΈ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.β264Updated 3 months ago
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understandingβ70Updated 4 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'β196Updated 2 months ago
- Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).β121Updated last year
- A Large-scale Video Action Datasetβ341Updated 2 weeks ago