VITA-Group/VLM-3R

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VITA-Group/VLM-3R)

VITA-Group / VLM-3R

[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

☆431

Alternatives and similar repositories for VLM-3R

Users that are interested in VLM-3R are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆248Nov 28, 2025Updated 8 months ago
THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆480Feb 5, 2026Updated 5 months ago
CUT3R / CUT3R
View on GitHub
Official implementation of Continuous 3D Perception Model with Persistent State
☆1,470Aug 27, 2025Updated 11 months ago
InternRobotics / G2VLM
View on GitHub
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
☆346Apr 18, 2026Updated 3 months ago
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆158Dec 9, 2025Updated 7 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
wzzheng / StreamVGGT
View on GitHub
[ICLR 2026] Streaming 4D Visual Geometry Transformer
☆945Oct 27, 2025Updated 9 months ago
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆734Aug 5, 2025Updated 11 months ago
YkiWu / Point3R
View on GitHub
[NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
☆192Mar 10, 2026Updated 4 months ago
LaVi-Lab / Video-3D-LLM
View on GitHub
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆220Jun 4, 2025Updated last year
yyfz / Pi3
View on GitHub
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
☆2,095Jul 3, 2026Updated 3 weeks ago
facebookresearch / DepthLM_Official
View on GitHub
[ICLR 2026 Oral (top 1.2%)] Official implementation of DepthLM
☆363Jun 1, 2026Updated last month
LogosRoboticsGroup / SPAR
View on GitHub
From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…
☆90Jan 5, 2026Updated 6 months ago
OuyangKun10 / SpaceR
View on GitHub
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆111Jul 9, 2025Updated last year
mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM
View on GitHub
A paper list for spatial reasoning
☆767Jan 19, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Haochen-Wang409 / ross3d
View on GitHub
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆70Jul 22, 2025Updated last year
facebookresearch / Multi-SpatialMLLM
View on GitHub
[CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆178Feb 25, 2026Updated 5 months ago
liruilong940607 / prope
View on GitHub
Cameras as Relative Positional Encoding
☆742Dec 18, 2025Updated 7 months ago
cambrian-mllm / cambrian-s
View on GitHub
Cambrian-S: Towards Spatial Supersensing in Video
☆563Apr 3, 2026Updated 3 months ago
NIRVANALAN / STream3R
View on GitHub
Dynamic 3D Foundation Model using Causal Transformer. [ICLR 2026]
☆392May 8, 2026Updated 2 months ago
AIGeeksGroup / 3D-R1
View on GitHub
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
☆414Jul 20, 2026Updated last week
SunYangtian / UniGeo
View on GitHub
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation
☆136Jun 10, 2025Updated last year
kaist-cvml / geometric-distillation
View on GitHub
[EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
☆39Jun 12, 2025Updated last year
UMass-Embodied-AGI / MindJourney
View on GitHub
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆151Nov 4, 2025Updated 8 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
NVlabs / LSM
View on GitHub
[NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D
☆236Feb 11, 2026Updated 5 months ago
InternRobotics / Aether
View on GitHub
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
☆604Oct 26, 2025Updated 9 months ago
THU-SI / LangScene-X
View on GitHub
[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
☆302Jul 15, 2025Updated last year
THU-SI / Spatial-TTT
View on GitHub
[ECCV 2026] Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
☆243Jun 19, 2026Updated last month
Inception3D / TTT3R
View on GitHub
[ICLR 2026] A simple state update rule to enhance length generalization for CUT3R
☆711May 11, 2026Updated 2 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆200Mar 25, 2026Updated 4 months ago
Davidyao99 / uni4d
View on GitHub
[CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
☆225May 25, 2025Updated last year
johnson111788 / SpatialReasoner
View on GitHub
Training recipe for SpatialReasoner [NeurIPS 2025]
☆45Apr 5, 2026Updated 3 months ago
NJU-3DV / SpatialVID
View on GitHub
[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
☆589Apr 22, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / univlg
View on GitHub
Unifying 2D and 3D Vision-Language Understanding
☆127Jul 2, 2026Updated 3 weeks ago
OpenSenseNova / SenseNova-SI
View on GitHub
[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆290May 14, 2026Updated 2 months ago
fudan-zvg / UniUGG
View on GitHub
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding. Accepted to ICLR 2026.
☆63Jul 16, 2026Updated last week
hwjiang1510 / RayZer
View on GitHub
Code for ICCV'2025 (Best student paper honorable mention) "RayZer: A Self-supervised Large View Synthesis Model"
☆444Nov 24, 2025Updated 8 months ago
Inception3D / Easi3R
View on GitHub
[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.
☆533Apr 1, 2025Updated last year
Haian-Jin / LVSM
View on GitHub
[ICLR 2025 Oral] Official code for "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias"
☆550Aug 4, 2025Updated 11 months ago
WU-CVGL / GS-Reasoner
View on GitHub
Reasoning in Space via Grounding in the World (ICLR 2025)
☆56Nov 3, 2025Updated 8 months ago