ttengwang / Awesome_Long_Form_Video_UnderstandingLinks

Awesome papers & datasets specifically focused on long-term videos.

☆315

Alternatives and similar repositories for Awesome_Long_Form_Video_Understanding

Users that are interested in Awesome_Long_Form_Video_Understanding are comparing it to the libraries listed below

Sorting:

huangb23 / VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆288Updated last year
RenShuhuai-Andy / TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆393Updated 5 months ago
www-Ye / Time-R1
R1-like Video-LLM for Temporal Grounding
☆120Updated 3 months ago
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆333Updated last year
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆126Updated last month
OpenGVLab / VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆469Updated 4 months ago
tsb0601 / MMVP
☆354Updated last year
scofield7419 / Video-of-Thought
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆166Updated 7 months ago
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆139Updated 3 months ago
JUNJIE99 / MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆226Updated last month
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆378Updated 9 months ago
sudo-Boris / mr-Blip
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
☆91Updated 7 months ago
TimeMarker-LLM / TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆98Updated 10 months ago
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆208Updated 2 weeks ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆368Updated 7 months ago
imagegridworth / IG-VLM
☆138Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆124Updated 6 months ago
NeeluMadan / ViFM_Survey
Foundation Models for Video Understanding: A Survey
☆139Updated 3 months ago
yaolinli / TimeChat-Online
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆81Updated last month
yongliang-wu / NumPro
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆123Updated this week
egoschema / EgoSchema
☆101Updated 9 months ago
gyxxyg / VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆113Updated 10 months ago
TencentARC / ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆150Updated last year
jpthu17 / EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
☆140Updated last year
pipixin321 / Awesome-Video-MLLMs
Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding
☆46Updated last month
wxh1996 / VideoAgent
☆116Updated 5 months ago
WHB139426 / Grounded-Video-LLM
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆125Updated last month
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆155Updated 7 months ago
YueFan1014 / VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆260Updated 10 months ago
haokunwen / Awesome-Composed-Image-Retrieval
Collection of Composed Image Retrieval (CIR) papers.
☆267Updated last month