facebookresearch / DepthLM_OfficialLinks
Official implementation of DepthLM
β243Updated last month
Alternatives and similar repositories for DepthLM_Official
Users that are interested in DepthLM_Official are comparing it to the libraries listed below
Sorting:
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ293Updated 2 months ago
- [CVPR 2024] π‘Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoningβ80Updated last year
- Trace Anything: Representing Any Video in 4D via Trajectory Fieldsβ372Updated 2 weeks ago
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modelingβ387Updated this week
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understandingβ96Updated 9 months ago
- [NeurIPS 2025] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPSβ134Updated 3 weeks ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'β154Updated last month
- [ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Raβ¦β104Updated 7 months ago
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memoryβ188Updated 6 months ago
- [ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuningβ301Updated 2 months ago
- Seeing World Dynamics in a Nutshellβ110Updated 7 months ago
- Unifying 2D and 3D Vision-Language Understandingβ116Updated 3 months ago
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotationsβ410Updated this week
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantizationβ49Updated last month
- Official implementation of paper "Pyramid Diffusion for Fine 3D Large Scene Generation" (ECCV 2024 Oral)β129Updated 7 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Modelsβ158Updated last month
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Videoβ200Updated 5 months ago
- [CVPR 2024 Highlight] GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understandingβ27Updated last year
- 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understandingβ353Updated 2 weeks ago
- Official implementation of β4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Modelsβ (CVPR 2025)β155Updated last month
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillationβ24Updated 5 months ago
- [NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memoryβ160Updated last month
- β112Updated 4 months ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splattingβ97Updated 7 months ago
- Official Implementation of paper "St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World"β80Updated last month
- SceneFun3D ToolKitβ159Updated 6 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ380Updated 4 months ago
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encodingβ55Updated 2 months ago
- A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)β360Updated last week
- [NeurIPS 2025 Spotlight] Official implementation of the SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Aligβ¦β136Updated last month