LaVi-Lab / VG-LLMLinks
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆45Updated last week
Alternatives and similar repositories for VG-LLM
Users that are interested in VG-LLM are comparing it to the libraries listed below
Sorting:
- ☆47Updated last month
- [ECCV 2024] EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.☆95Updated last year
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆179Updated this week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆117Updated 3 weeks ago
- Self-reimplemented version of 4D-LRM.☆30Updated 3 weeks ago
- Open-world 3D part segmentation of point clouds☆80Updated last month
- [ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy☆103Updated this week
- Official implementation of the paper "Unifying 3D Vision-Language Understanding via Promptable Queries"☆76Updated 10 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆41Updated last month
- [NeurIPS 2024] Official code repository for MSR3D paper☆60Updated last week
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆29Updated 2 weeks ago
- ☆71Updated 3 weeks ago
- [ICLR 2025] MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow☆22Updated 2 months ago
- [NeurIPS'2024] Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly☆58Updated 6 months ago
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆39Updated 6 months ago
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆58Updated 2 weeks ago
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆54Updated last year
- Code of 3DMIT: 3D MULTI-MODAL INSTRUCTION TUNING FOR SCENE UNDERSTANDING☆30Updated 11 months ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆93Updated 4 months ago
- Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", ICCV 2025.☆66Updated this week
- [ICLR 2025] Official code of "Segment any 3D Object with Language"☆49Updated this week
- "VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames"☆75Updated last month
- [Arxiv'24] LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding☆31Updated 3 months ago
- [CVPR2025] Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation☆96Updated 2 weeks ago
- [ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects☆85Updated last year
- Seeing World Dynamics in a Nutshell☆109Updated 3 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆34Updated 3 weeks ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆36Updated 4 months ago
- ☆35Updated 3 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆63Updated 2 weeks ago