VITA-Group / VLM-3RLinks
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆214Updated 2 weeks ago
Alternatives and similar repositories for VLM-3R
Users that are interested in VLM-3R are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆94Updated 5 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆76Updated 2 weeks ago
- ☆48Updated last month
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video☆168Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆133Updated last month
- SceneFun3D ToolKit☆147Updated 2 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆285Updated 3 weeks ago
- [NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D☆198Updated 3 months ago
- Seeing World Dynamics in a Nutshell☆109Updated 3 months ago
- Official implementation of “4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models” (CVPR 2025)☆113Updated 3 months ago
- Official implementation of paper "Pyramid Diffusion for Fine 3D Large Scene Generation" (ECCV 2024 Oral)☆125Updated 3 months ago
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory☆166Updated 2 months ago
- Unifying 2D and 3D Vision-Language Understanding☆95Updated 3 months ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆36Updated 5 months ago
- Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", ICCV 2025.☆72Updated 2 weeks ago
- [CVPR 2024] 🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning☆79Updated last year
- "VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames"☆79Updated last month
- Self-reimplemented version of 4D-LRM.☆47Updated last month
- [ECCV 2024] EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.☆97Updated last year
- [CVPR2025] Feat2GS: Probing Visual Foundation Models with Gaussian Splatting☆197Updated last week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆133Updated last month
- [ICCV2023] 🧊FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models☆128Updated 10 months ago
- DELTA: Dense Efficient Long-range 3D Tracking for Any video (ICLR 2025)☆115Updated 3 months ago
- MEt3R: Measuring Multi-View Consistency in Generated Images☆116Updated 2 months ago
- Official PyTorch implementation of the paper ‘CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Und…☆50Updated last year
- Streaming 3D Reconstruction with Explicit Spatial Pointer Memory☆98Updated last week
- Official implementation of the paper "Unifying 3D Vision-Language Understanding via Promptable Queries"☆77Updated 11 months ago
- Official implementation of CVPR25 paper "Decompositional Neural Scene Reconstruction with Generative Diffusion Prior"☆78Updated 3 months ago
- Official implementation of EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting☆39Updated 3 weeks ago
- Open-world 3D part segmentation of point clouds☆82Updated 2 weeks ago