OpenGVLab / InternVideo2

☆210

Related projects ⓘ

Alternatives and complementary repositories for InternVideo2

bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆144Updated 2 weeks ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆82Updated 4 months ago
huangb23 / VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆226Updated 5 months ago
md-mohaiminul / VideoRecap
☆165Updated 4 months ago
RenShuhuai-Andy / TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆289Updated 5 months ago
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆244Updated 4 months ago
IVGSZ / Flash-VStream
This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
☆130Updated 3 months ago
OpenGVLab / unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
☆295Updated 5 months ago
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆245Updated 10 months ago
mbzuai-oryx / VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
☆217Updated 3 months ago
OpenGVLab / video-mamba-suite
The suite of modeling video with Mamba
☆238Updated 6 months ago
TencentARC / ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆125Updated 2 months ago
baaivision / DIVA
Diffusion Feedback Helps CLIP See Better
☆215Updated 2 months ago
EasonXiao-888 / UVCOM
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
☆75Updated 4 months ago
yukw777 / VideoBLIP
Supercharged BLIP-2 that can handle videos
☆116Updated 11 months ago
imagegridworth / IG-VLM
☆120Updated last month
YueFan1014 / VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆131Updated 2 months ago
mutonix / Vript
☆127Updated 2 weeks ago
WHB139426 / Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆63Updated last week
Ziyang412 / VideoTree
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆80Updated 3 months ago
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆89Updated last week
baaivision / EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
☆231Updated last month
sming256 / AdaTAD
[CVPR2024] The official implementation of AdaTAD: End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
☆32Updated 4 months ago
NeeluMadan / ViFM_Survey
Foundation Models for Video Understanding: A Survey
☆97Updated 2 months ago
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
☆157Updated 4 months ago
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆334Updated 3 weeks ago
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆166Updated 3 weeks ago
X-PLUG / mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
☆220Updated last year
sudo-Boris / mr-Blip
Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"
☆46Updated this week