vbdi / Ego3D-BenchLinks
Spatial Reasoning with Vision-Language Models
☆34Updated 2 months ago
Alternatives and similar repositories for Ego3D-Bench
Users that are interested in Ego3D-Bench are comparing it to the libraries listed below
Sorting:
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding☆58Updated 5 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆30Updated 7 months ago
- Self-reimplemented version of 4D-LRM.☆65Updated 8 months ago
- [CVPR 2025] Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields☆32Updated 3 months ago
- Seeing World Dynamics in a Nutshell☆111Updated 10 months ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…☆75Updated 3 weeks ago
- [ECCV 2024] EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.☆100Updated last year
- ☆88Updated 8 months ago
- Open-world 3D part segmentation of point clouds☆112Updated 6 months ago
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"☆48Updated last month
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆101Updated 9 months ago
- SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis☆36Updated 7 months ago
- Official Code for 'AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction' (ICCV 2025)☆62Updated 2 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆196Updated 2 months ago
- WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes☆106Updated 10 months ago
- Official implementation of ICCV 2025 paper "EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds".☆45Updated 7 months ago
- Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”☆75Updated last month
- ☆79Updated last year
- [CVPR2025] Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation☆140Updated 6 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Updated 3 months ago
- [Arxiv'25] DINO-Tok: Adapting DINO for Visual Tokenizers☆35Updated 2 months ago
- ☆278Updated 3 months ago
- [NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding☆140Updated last month
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆205Updated last month
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆135Updated this week
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models☆80Updated 2 weeks ago
- "Comp4D: Compositional 4D Scene Generation", Dejia Xu*, Hanwen Liang*, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Platanioti…☆78Updated last year
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆59Updated 3 weeks ago
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆417Updated 3 weeks ago
- [CVPR 2025] Official code for the paper "SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis"☆132Updated 10 months ago