alanaai / EVUDLinks

Egocentric Video Understanding Dataset (EVUD)

☆31

Alternatives and similar repositories for EVUD

Users that are interested in EVUD are comparing it to the libraries listed below

Sorting:

ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆74Updated 10 months ago
OpenGVLab / EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
☆70Updated 2 months ago
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆66Updated 9 months ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆48Updated last year
VincentDENGP / 3D-LR
Can 3D Vision-Language Models Truly Understand Natural Language?
☆21Updated last year
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated 4 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆29Updated 3 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated 11 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆75Updated last year
qiulu66 / EgoPlan-Bench2
☆26Updated 6 months ago
mshukor / ima-lmms
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆21Updated last year
BolinLai / LEGO
[ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…
☆38Updated 8 months ago
kkahatapitiya / LangRepo
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆32Updated last year
OpenGVLab / EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
☆131Updated 5 months ago
qirui-chen / MultiHop-EgoQA
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆27Updated 5 months ago
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆64Updated last year
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
Share14 / ShareGemini
☆31Updated last year
TencentARC / SEED-Bench-R1
☆91Updated 4 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆125Updated 3 months ago
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆101Updated last year
JoeLeelyf / OVO-Bench
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆100Updated 3 months ago
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆86Updated last year
xvjiarui / IMProv
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
☆57Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆61Updated last year
AdaCheng / EgoThink
[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…
☆61Updated 7 months ago
si0wang / VisVM
☆45Updated 10 months ago
stogiannidis / srbench
Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"
☆16Updated last month
fpv-iplab / EASG
Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)
☆43Updated 6 months ago