AdaCheng / EgoThinkLinks

[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"

☆63

Alternatives and similar repositories for EgoThink

Users that are interested in EgoThink are comparing it to the libraries listed below

Sorting:

ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆74Updated last year
alanaai / EVUD
Egocentric Video Understanding Dataset (EVUD)
☆32Updated last year
OpenGVLab / VeBrain
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆87Updated 6 months ago
SilongYong / SQA3D
[ICLR 2023] SQA3D for embodied scene understanding and reasoning
☆152Updated 2 years ago
VincentDENGP / 3D-LR
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Updated last year
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆46Updated 6 months ago
physical-superintelligence-lab / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …
☆79Updated 6 months ago
Gabesarch / grounded-rl
☆108Updated 4 months ago
MINT-SJTU / STI-Bench
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
☆33Updated 5 months ago
ATR-DBI / ScanQA
☆147Updated 2 years ago
zhijie-group / R1-Zero-VSI
☆41Updated 6 months ago
mll-lab-nu / MindCube
☆102Updated last month
VinceOuti / Open3DVQA
☆30Updated 3 weeks ago
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆98Updated 5 months ago
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆88Updated last year
ZCMax / ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
☆80Updated last year
2toinf / IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆39Updated last year
qiulu66 / EgoPlan-Bench2
☆26Updated 7 months ago
mahtabbigverdi / Aurora-perception
☆42Updated 3 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆29Updated 4 months ago
Zhoues / RoboRefer
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆205Updated last month
hmxiong / StreamChat
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆89Updated 8 months ago
declare-lab / Emma-X
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
☆78Updated 6 months ago
MSR3D / MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
☆68Updated last week
Chat-3D / Chat-3D
Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"
☆56Updated last year
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
UMass-Embodied-AGI / MultiPLY
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
☆134Updated last year
qizekun / OmniSpatial
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆76Updated 2 months ago
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆61Updated 9 months ago
OpenHelix-Team / VLA-RFT
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
☆96Updated 2 months ago