vbdi/Ego3D-Bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vbdi/Ego3D-Bench)

vbdi / Ego3D-Bench

[ICLR2026] Spatial Reasoning with Vision-Language Models

☆60

Alternatives and similar repositories for Ego3D-Bench

Users that are interested in Ego3D-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVlabs / SpaceTools-Toolshed
View on GitHub
☆16Mar 24, 2026Updated 4 months ago
mll-lab-nu / MindCube
View on GitHub
☆164Mar 23, 2026Updated 4 months ago
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆158Dec 9, 2025Updated 7 months ago
vbdi / divprune
View on GitHub
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
☆86Apr 16, 2026Updated 3 months ago
irom-princeton / spine
View on GitHub
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
☆32Oct 3, 2025Updated 9 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
DYZhang09 / ViTWSS3D
View on GitHub
[ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
☆13Apr 12, 2024Updated 2 years ago
kaist-cvml / geometric-distillation
View on GitHub
[EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
☆39Jun 12, 2025Updated last year
UMass-Embodied-AGI / MindJourney
View on GitHub
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆151Nov 4, 2025Updated 8 months ago
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆248Nov 28, 2025Updated 8 months ago
hunarbatra / SpatialThinker
View on GitHub
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
☆41Jan 28, 2026Updated 6 months ago
Yui010206 / Adaptive-Visual-Imagination-Control
View on GitHub
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
☆18Jun 2, 2026Updated last month
zhangquanchen / 3DThinker
View on GitHub
[CVPR 2026] Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
☆244May 7, 2026Updated 2 months ago
THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆480Feb 5, 2026Updated 5 months ago
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆734Aug 5, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
WanyueZhang-ai / spatial-understanding
View on GitHub
☆20Sep 3, 2025Updated 10 months ago
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆431Jul 15, 2026Updated 2 weeks ago
mll-lab-nu / Theory-of-Space
View on GitHub
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…
☆85Feb 27, 2026Updated 5 months ago
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated last year
AnjieCheng / SpatialRGPT
View on GitHub
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆336Dec 14, 2024Updated last year
Chenyu-Wang567 / All-Angles-Bench
View on GitHub
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
☆70Mar 22, 2026Updated 4 months ago
STARE-bench / STARE
View on GitHub
☆19Oct 12, 2025Updated 9 months ago
KAIST-Visual-AI-Group / VG-AVS
View on GitHub
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection
☆24Feb 5, 2026Updated 5 months ago
ExplainableML / ImageSelect
View on GitHub
Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"
☆27Jul 10, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆200Mar 25, 2026Updated 4 months ago
zhengrongz / AoTD
View on GitHub
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
☆58Updated this week
OpenSenseNova / SenseNova-SI
View on GitHub
[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆290May 14, 2026Updated 2 months ago
InternRobotics / G2VLM
View on GitHub
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
☆346Apr 18, 2026Updated 3 months ago
mgholamikn / PoseGen
View on GitHub
[AAAI 2024] PoseGen: Learning to Generate 3D Human Pose Datasets with NeRF
☆10Dec 29, 2023Updated 2 years ago
synsin0 / COME
View on GitHub
Adding Scene-Centric Forecasting Control to Occupancy World Model
☆43Jul 1, 2026Updated 3 weeks ago
mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM
View on GitHub
A paper list for spatial reasoning
☆767Jan 19, 2026Updated 6 months ago
join16 / LEAP
View on GitHub
Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging
☆15Dec 31, 2024Updated last year
arijitray1993 / SAT
View on GitHub
Spatial Aptitude Training for Multimodal Langauge Models
☆33Feb 8, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
mengcaopku / SpatialDreamer
View on GitHub
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
☆15Feb 1, 2026Updated 5 months ago
sosppxo / MDIN
View on GitHub
[MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation
☆43Dec 15, 2024Updated last year
vision-x-nyu / test-set-training
View on GitHub
☆15Nov 25, 2025Updated 8 months ago
yangcaoai / VGGT-Det-CVPR2026
View on GitHub
Official code for CVPR 2026 paper: VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
☆145Jul 15, 2026Updated 2 weeks ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
qq456cvb / 3DCorrEnhance
View on GitHub
☆37Jun 13, 2026Updated last month
wangzhichuan123 / DAC
View on GitHub
[ICCV 2025] Official PyTorch Code for "Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval"
☆18Aug 23, 2025Updated 11 months ago