JoseponLee / IntentQA
Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.
β16Updated 5 months ago
Alternatives and similar repositories for IntentQA:
Users that are interested in IntentQA are comparing it to the libraries listed below
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β58Updated 3 months ago
- β71Updated 5 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ31Updated 5 months ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlightβ37Updated last year
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Rewardβ32Updated last month
- FreeVA: Offline MLLM as Training-Free Video Assistantβ59Updated 10 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β60Updated 7 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)β28Updated last month
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"β32Updated last year
- Pytorch implementation for Egoinstructor at CVPR 2024β19Updated 5 months ago
- β33Updated 7 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ75Updated 3 weeks ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? β Episodic-Memory-Based Question Answering on Egocentric Videos"β25Updated last year
- Egocentric Video Understanding Dataset (EVUD)β29Updated 10 months ago
- β89Updated 4 months ago
- γNeurIPS 2024γThe official code of paper "Automated Multi-level Preference for MLLMs"β19Updated 7 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β70Updated 10 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ41Updated last year
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)β42Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ47Updated last month
- VisualGPTScore for visio-linguistic reasoningβ27Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ53Updated 10 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".β36Updated 2 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputsβ18Updated 6 months ago
- β32Updated last year
- R1-like Video-LLM for Temporal Groundingβ84Updated 3 weeks ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Updated 7 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β25Updated 3 weeks ago
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)β50Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ74Updated 6 months ago