JoseponLee / IntentQA
Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.
β14Updated 3 months ago
Alternatives and similar repositories for IntentQA:
Users that are interested in IntentQA are comparing it to the libraries listed below
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β53Updated last month
- β68Updated 3 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrievalβ35Updated 6 months ago
- Egocentric Video Understanding Dataset (EVUD)β26Updated 7 months ago
- [NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grapβ¦β73Updated 9 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β63Updated 8 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β93Updated 4 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β56Updated 5 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ49Updated 8 months ago
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]β95Updated 8 months ago
- β87Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ56Updated 8 months ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"β30Updated last year
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ28Updated 3 months ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? β Episodic-Memory-Based Question Answering on Egocentric Videos"β22Updated last year
- β28Updated last year
- [NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Groundingβ47Updated 11 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ21Updated 5 months ago
- A reading list of papers about Visual Grounding.β31Updated 2 years ago
- [CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024β123Updated 7 months ago
- This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"β¦β46Updated 11 months ago
- β31Updated 5 months ago
- β39Updated 10 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ26Updated 5 months ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"β70Updated last month
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024β50Updated 5 months ago
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)β39Updated 10 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ70Updated 4 months ago
- [CVPR'25] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ51Updated this week
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)β50Updated last year