zyayoung / Awesome-Video-LLMsLinks

Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.

☆36

Alternatives and similar repositories for Awesome-Video-LLMs

Users that are interested in Awesome-Video-LLMs are comparing it to the libraries listed below

Sorting:

Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆142Updated 4 months ago
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆125Updated 3 months ago
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆64Updated last year
Hon-Wong / Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆86Updated last year
ziplab / LongVLM
☆104Updated last year
TimeMarker-LLM / TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆101Updated 11 months ago
imagegridworth / IG-VLM
☆138Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
WHB139426 / Grounded-Video-LLM
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆131Updated 2 months ago
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆125Updated 7 months ago
DCDmllm / Momentor
☆80Updated 11 months ago
TencentARC / ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆150Updated last year
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
wdrink / OmniVid
☆53Updated last year
farewellthree / BT-Adapter
[CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"
☆35Updated last year
amazon-science / QA-ViT
☆69Updated last year
wxh1996 / VideoAgent
☆119Updated 6 months ago
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆94Updated 9 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆132Updated 2 months ago
rxtan2 / Koala-video-llm
☆36Updated last year
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆67Updated 9 months ago
isekai-portal / Link-Context-Learning
☆99Updated last year
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆61Updated last year
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆60Updated 3 months ago
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆259Updated 3 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆167Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆179Updated last year
wengzejia1 / Open-VCLIP
☆118Updated last year