alanaai / EVUDLinks
Egocentric Video Understanding Dataset (EVUD)
β29Updated 11 months ago
Alternatives and similar repositories for EVUD
Users that are interested in EVUD are comparing it to the libraries listed below
Sorting:
- β69Updated 5 months ago
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β58Updated 4 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedbackβ65Updated 8 months ago
- β30Updated 10 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ34Updated 6 months ago
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decodingβ45Updated last year
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)β42Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? β Episodic-Memory-Based Question Answering on Egocentric Videos"β25Updated last year
- Can 3D Vision-Language Models Truly Understand Natural Language?β21Updated last year
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ82Updated last month
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"β35Updated 6 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ47Updated 2 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputsβ18Updated 7 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ24Updated last week
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?β59Updated 2 months ago
- Awesome paper for multi-modal llm with grounding abilityβ17Updated 10 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasksβ57Updated 8 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"β28Updated 8 months ago
- β81Updated 2 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β58Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ75Updated 7 months ago
- [CVPR 2024] Data and benchmark code for the EgoExoLearn datasetβ59Updated 9 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuningβ89Updated last month
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β39Updated last month
- Official PyTorch code of GroundVQA (CVPR'24)β61Updated 8 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ60Updated 11 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Modelsβ85Updated 9 months ago
- Language Repository for Long Video Understandingβ31Updated 11 months ago
- β43Updated 5 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β95Updated 7 months ago