gca-spatial-reasoning / gcaLinks
Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"
☆23Updated 3 weeks ago
Alternatives and similar repositories for gca
Users that are interested in gca are comparing it to the libraries listed below
Sorting:
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆189Updated last month
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆319Updated 4 months ago
- [NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding☆137Updated last month
- [NeurIPS 2025 Spotlight] Official implementation of the SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alig…☆154Updated 3 months ago
- Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”☆70Updated last month
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…☆72Updated 3 months ago
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆409Updated 3 weeks ago
- [CVPR 2025 Highlight🔥] Official code repository for "Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuni…☆124Updated last month
- ☆70Updated 9 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆189Updated 7 months ago
- [CVPR 2024 Highlight] GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding☆26Updated last year
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆419Updated last month
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆34Updated last month
- Official implementation of EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting☆51Updated 6 months ago
- Public code for XFactor: Introduces the first geometry-free model to achieve true self-supervised / pose-free Novel View Synthesis (NVS) …☆76Updated 2 months ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆99Updated 11 months ago
- [CVPR 2025] 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer☆84Updated 7 months ago
- Unifying 2D and 3D Vision-Language Understanding☆119Updated 5 months ago
- [ICML2025 Oral] ReferSplat: Referring Segmentation in 3D Gaussian Splatting☆130Updated 3 months ago
- A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)☆412Updated this week
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding☆58Updated 4 months ago
- [ACM MM 2025] EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler☆25Updated 5 months ago
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation☆171Updated 6 months ago
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models☆64Updated 2 weeks ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆58Updated last week
- [NeurIPS 24] The implementation and dataset of LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and…☆60Updated 9 months ago
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆465Updated 3 weeks ago
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆206Updated 2 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆27Updated 2 months ago
- (ECCV'24) Official Implementation of SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior.☆16Updated last year