facebookresearch / DepthLM_OfficialLinks
Official implementation of DepthLM
β229Updated 2 weeks ago
Alternatives and similar repositories for DepthLM_Official
Users that are interested in DepthLM_Official are comparing it to the libraries listed below
Sorting:
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ281Updated last month
- [CVPR 2024] π‘Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoningβ79Updated last year
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Modelsβ156Updated 2 weeks ago
- [NeurIPS 2025] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPSβ130Updated last week
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understandingβ95Updated 8 months ago
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modelingβ376Updated last week
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memoryβ184Updated 6 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'β145Updated 2 weeks ago
- [ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Raβ¦β103Updated 7 months ago
- Trace Anything: Representing Any Video in 4D via Trajectory Fieldsβ296Updated last week
- Seeing World Dynamics in a Nutshellβ109Updated 7 months ago
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotationsβ392Updated this week
- Official implementation of β4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Modelsβ (CVPR 2025)β148Updated 2 weeks ago
- Unifying 2D and 3D Vision-Language Understandingβ111Updated 3 months ago
- Self-reimplemented version of 4D-LRM.β59Updated 4 months ago
- [ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuningβ300Updated last month
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptioβ¦β59Updated 2 weeks ago
- SceneFun3D ToolKitβ157Updated 6 months ago
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Videoβ55Updated 5 months ago
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Videoβ196Updated 5 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMsβ49Updated 3 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillationβ24Updated 4 months ago
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encodingβ54Updated 2 months ago
- β111Updated 4 months ago
- MEt3R: Measuring Multi-View Consistency in Generated Imagesβ138Updated 3 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidanceβ43Updated 4 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ366Updated 4 months ago
- [NeurIPS 2025] Pixel-Perfect Depthβ528Updated last week
- Official implementation of paper "Pyramid Diffusion for Fine 3D Large Scene Generation" (ECCV 2024 Oral)β126Updated 6 months ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splattingβ96Updated 6 months ago