ChenYi99 / EgoPlanLinks

[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

☆74

Alternatives and similar repositories for EgoPlan

Users that are interested in EgoPlan are comparing it to the libraries listed below

Sorting:

alanaai / EVUD
Egocentric Video Understanding Dataset (EVUD)
☆32Updated last year
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated 5 months ago
EmbodiedGPT / EgoCOT_Dataset
☆54Updated last year
AdaCheng / EgoThink
[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…
☆62Updated 7 months ago
2toinf / IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆39Updated last year
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆50Updated last year
Gabesarch / grounded-rl
☆103Updated 3 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated last year
VincentDENGP / 3D-LR
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Updated last year
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
IranQin / MP5
[CVPR2024] This is the official implement of MP5
☆106Updated last year
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆70Updated 9 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆29Updated 4 months ago
si0wang / VisVM
☆45Updated 10 months ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
HAWLYQ / ET-Cap
☆24Updated 2 years ago
OpenGVLab / EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
☆70Updated 2 months ago
RifleZhang / LLaVA-Hound-DPO
☆155Updated last year
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆87Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆50Updated 8 months ago
kkahatapitiya / LangRepo
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆32Updated last year
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆131Updated 2 years ago
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆61Updated 8 months ago
lbaermann / qaego4d
Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"
☆28Updated 2 years ago
SilongYong / SQA3D
[ICLR 2023] SQA3D for embodied scene understanding and reasoning
☆151Updated 2 years ago
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆101Updated last year
qiulu66 / EgoPlan-Bench2
☆26Updated 7 months ago
thunlp / EmbodiedEval
Evaluate Multimodal LLMs as Embodied Agents
☆54Updated 9 months ago
DoubtedSteam / MM-GCoT
The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"
☆18Updated 3 months ago
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Updated 2 years ago