neu-vi / struct2dLinks
Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models'
☆19Updated last month
Alternatives and similar repositories for struct2d
Users that are interested in struct2d are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆21Updated 3 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆48Updated last month
- From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D☆56Updated 3 months ago
- [3DV 2025] Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model☆95Updated 3 months ago
- Self-reimplemented version of 4D-LRM.☆52Updated 3 months ago
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding☆47Updated 3 weeks ago
- [ICLR 2025] MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow☆23Updated 5 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆116Updated last month
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆54Updated 2 months ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆21Updated 5 months ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆39Updated last month
- [ICLR 2025] Official code of "Segment any 3D Object with Language"☆51Updated 2 months ago
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆93Updated 3 weeks ago
- ☆95Updated 2 months ago
- [ICCV 2025] Official pytorch implementation of "SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering"☆49Updated 5 months ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆29Updated last year
- ☆33Updated 4 months ago
- ☆34Updated last year
- [ECCV 2024] EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.☆99Updated last year
- Official implementation of PARIS3D (Accepted to ECCV 2024).☆26Updated 11 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆53Updated last month
- Open-Vocabulary SAM3D: Understand Any 3D Scene☆31Updated 3 months ago
- Paper: UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting☆20Updated 3 months ago
- Open-world 3D part segmentation of point clouds☆84Updated last month
- Official Code for 'AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction' (ICCV 2025)☆54Updated last month
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆95Updated 7 months ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆73Updated 4 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆43Updated 3 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆18Updated 8 months ago
- Official Implementation of VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Jo…☆20Updated 2 months ago