JoseponLee / IntentQALinks

Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.

☆22

Alternatives and similar repositories for IntentQA

Users that are interested in IntentQA are comparing it to the libraries listed below

Sorting:

PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆71Updated 10 months ago
imagegridworth / IG-VLM
☆140Updated last year
egoschema / EgoSchema
☆104Updated 11 months ago
DCDmllm / Momentor
☆80Updated last year
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆58Updated 6 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆128Updated 4 months ago
yellow-binary-tree / HawkEye
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆44Updated last year
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆146Updated 5 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆125Updated 8 months ago
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆83Updated last year
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆61Updated 9 months ago
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆52Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆96Updated last year
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆100Updated last year
showlab / VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆143Updated 11 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆138Updated 3 months ago
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆104Updated last year
ziplab / LongVLM
☆107Updated last year
TimeMarker-LLM / TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆103Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆65Updated last year
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆80Updated last year
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆111Updated last year
WHB139426 / Grounded-Video-LLM
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆137Updated 3 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 7 months ago
takomc / amp
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆20Updated last year
RifleZhang / LLaVA-Hound-DPO
☆155Updated last year