facebookresearch / DepthLM_OfficialLinks
Official implementation of DepthLM
β73Updated this week
Alternatives and similar repositories for DepthLM_Official
Users that are interested in DepthLM_Official are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillationβ22Updated 3 months ago
- [CVPR 2024] π‘Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoningβ81Updated last year
- Self-reimplemented version of 4D-LRM.β58Updated 4 months ago
- Seeing World Dynamics in a Nutshellβ109Updated 6 months ago
- β107Updated 3 months ago
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encodingβ53Updated last month
- [ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Raβ¦β104Updated 6 months ago
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ268Updated last month
- [NeurIPS 2025] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPSβ114Updated 2 weeks ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Predictionβ39Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Modelsβ153Updated 4 months ago
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memoryβ180Updated 5 months ago
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generationβ142Updated 2 months ago
- β33Updated 4 months ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understandingβ95Updated 8 months ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splattingβ95Updated 6 months ago
- MEt3R: Measuring Multi-View Consistency in Generated Imagesβ135Updated 2 months ago
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modelingβ364Updated this week
- SceneFun3D ToolKitβ156Updated 5 months ago
- DELTA: Dense Efficient Long-range 3D Tracking for Any video (ICLR 2025)β125Updated 5 months ago
- Unifying 2D and 3D Vision-Language Understandingβ106Updated 2 months ago
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Videoβ195Updated 4 months ago
- Official implementation of β4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Modelsβ (CVPR 2025)β141Updated 5 months ago
- β75Updated 4 months ago
- Official implementation of paper "Pyramid Diffusion for Fine 3D Large Scene Generation" (ECCV 2024 Oral)β126Updated 6 months ago
- [ICCV 2025] Amodal Depth Anything: Amodal Depth Estimation in the Wildβ36Updated 8 months ago
- β24Updated 6 months ago
- β34Updated last year
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generationβ118Updated 2 months ago
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantizationβ46Updated 2 weeks ago