NeeluMadan / ViFM_SurveyLinks

Foundation Models for Video Understanding: A Survey

☆129

Alternatives and similar repositories for ViFM_Survey

Users that are interested in ViFM_Survey are comparing it to the libraries listed below

Sorting:

wengzejia1 / Open-VCLIP
☆117Updated last year
sudo-Boris / mr-Blip
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
☆89Updated 5 months ago
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆323Updated last year
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆133Updated last month
gyxxyg / VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆107Updated 8 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆110Updated 3 weeks ago
huangb23 / VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆284Updated last year
TimeMarker-LLM / TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆95Updated 8 months ago
Visual-AI / FROSTER
The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"
☆85Updated 6 months ago
TencentARC / ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆149Updated 10 months ago
WHB139426 / Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆118Updated 4 months ago
DCDmllm / Momentor
☆76Updated 8 months ago
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆61Updated 10 months ago
ttengwang / Awesome_Long_Form_Video_Understanding
Awesome papers & datasets specifically focused on long-term videos.
☆284Updated 8 months ago
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆56Updated last year
wxh1996 / VideoAgent
☆107Updated 3 months ago
yongliang-wu / NumPro
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆112Updated 4 months ago
muzairkhattak / ViFi-CLIP
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
☆285Updated last year
ziplab / LongVLM
☆98Updated last year
contrastive / FreeVideoLLM
☆81Updated 9 months ago
benedettaliberatori / T3AL
Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024
☆64Updated 10 months ago
MCG-NJU / VideoChat-Online
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆53Updated last month
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆108Updated last week
farewellthree / BT-Adapter
[CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"
☆34Updated last year
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆257Updated this week
HJYao00 / Side4Video
☆40Updated last year
Ziyang412 / UCoFiA
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆65Updated last year
gls0425 / LinVT
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆82Updated 7 months ago
mlvlab / vid-TLDR
Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".
☆52Updated last year
mlvlab / Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆76Updated 4 months ago