JoseponLee / IntentQALinks
Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.
β22Updated last year
Alternatives and similar repositories for IntentQA
Users that are interested in IntentQA are comparing it to the libraries listed below
Sorting:
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β71Updated 10 months ago
- β140Updated last year
- β104Updated 11 months ago
- β80Updated last year
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiencyβ58Updated 6 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ128Updated 4 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ44Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β146Updated 5 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ37Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β125Updated 8 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β83Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)β64Updated last year
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMsβ61Updated 9 months ago
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)β52Updated last year
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Mindsβ96Updated last year
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]β100Updated last year
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ143Updated 11 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modelingβ138Updated 3 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β104Updated last year
- β107Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Abilityβ103Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistantβ65Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ62Updated last year
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedbackβ76Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ80Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β111Updated last year
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsβ137Updated 3 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrievalβ41Updated 7 months ago
- γNeurIPS 2024γThe official code of paper "Automated Multi-level Preference for MLLMs"β20Updated last year