VITA-Group / VLM-3RLinks
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆179Updated this week
Alternatives and similar repositories for VLM-3R
Users that are interested in VLM-3R are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆93Updated 4 months ago
- SceneFun3D ToolKit☆143Updated 2 months ago
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video☆153Updated last month
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆36Updated 4 months ago
- MEt3R: Measuring Multi-View Consistency in Generated Images☆107Updated last month
- ☆70Updated last week
- SpatialTrackerV3: 3D Point Tracking Made Easy☆95Updated last week
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆45Updated last week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆123Updated last month
- Official implementation of “4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models” (CVPR 2025)☆112Updated 2 months ago
- [CVPR 2024 Hightlight] Code release for "The More You See in 2D, the More You Perceive in 3D"☆62Updated 8 months ago
- ☆47Updated last month
- [ICCV2023] 🧊FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models☆127Updated 10 months ago
- Official PyTorch implementation of the paper ‘CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Und…☆49Updated last year
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory☆164Updated 2 months ago
- DELTA: Dense Efficient Long-range 3D Tracking for Any video (ICLR 2025)☆110Updated 2 months ago
- [CVPR2025] Feat2GS: Probing Visual Foundation Models with Gaussian Splatting☆177Updated 3 months ago
- [ICLR 2025] Official code of "Segment any 3D Object with Language"☆49Updated 5 months ago
- Open-world 3D part segmentation of point clouds☆80Updated last month
- [ECCV 2024] EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.☆95Updated last year
- Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", arXiv 2025.☆62Updated 2 months ago
- Official implementation of the paper "Unifying 3D Vision-Language Understanding via Promptable Queries"☆76Updated 10 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆110Updated 2 weeks ago
- ☆63Updated 7 months ago
- Seeing World Dynamics in a Nutshell☆109Updated 3 months ago
- [NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D☆192Updated 2 months ago
- [CVPR 2024] 🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning☆78Updated last year
- Official implementation of EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting☆37Updated this week
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆235Updated this week
- Code for MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data (CVPR 2025)☆185Updated last month