Visual Spatial Tuning
☆176Feb 19, 2026Updated last week
Alternatives and similar repositories for VST
Users that are interested in VST are comparing it to the libraries listed below
Sorting:
- ☆10Apr 7, 2025Updated 10 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆78Jan 21, 2026Updated last month
- LEO: A powerful Hybrid Multimodal LLM☆19Jan 18, 2025Updated last year
- [CVPR 2026] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence☆63Jul 9, 2025Updated 7 months ago
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆341Updated this week
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 4 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 5 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆177Feb 6, 2026Updated 3 weeks ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- [ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion☆296Jul 15, 2025Updated 7 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆21Oct 28, 2024Updated last year
- [ICCV 2025] Official implementation of "AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving"☆35Jul 15, 2025Updated 7 months ago
- Code for paper "Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions". Coming soon.☆33Jul 31, 2025Updated 7 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆438Feb 5, 2026Updated 3 weeks ago
- ☆33Apr 11, 2025Updated 10 months ago
- [ICCV 2025] Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models☆27Jan 7, 2026Updated last month
- ☆23Apr 19, 2024Updated last year
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆204Nov 28, 2025Updated 3 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆31Jun 12, 2025Updated 8 months ago
- MAPLE infuses dexterous manipulation priors from egocentric videos into vision encoders, making their features well-suited for downstream…☆29Dec 9, 2025Updated 2 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆433Updated this week
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆57Sep 12, 2025Updated 5 months ago
- ☆28Apr 8, 2025Updated 10 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 7 months ago
- ☆19Updated this week
- ☆18Aug 7, 2025Updated 6 months ago
- Official implementation of paper "Controllable 3D Outdoor Scene Generation via Scene Graphs" (ICCV 2025)☆62Jul 19, 2025Updated 7 months ago
- ☆23Feb 12, 2026Updated 2 weeks ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- HyperPose☆12Nov 6, 2025Updated 3 months ago
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆502Feb 21, 2026Updated last week
- [ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning☆310Dec 21, 2025Updated 2 months ago
- ICML2025☆63Aug 28, 2025Updated 6 months ago
- [NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memory☆179Sep 26, 2025Updated 5 months ago
- ☆132Mar 22, 2025Updated 11 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆133Nov 4, 2025Updated 3 months ago