BradyFU / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

☆489

Alternatives and similar repositories for Video-MME:

Users that are interested in Video-MME are comparing it to the libraries listed below

RenShuhuai-Andy / TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆348Updated 4 months ago
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆368Updated last week
OpenGVLab / VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆366Updated last week
BradyFU / Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
☆633Updated 3 months ago
apple / ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆206Updated 6 months ago
huangb23 / VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆261Updated 9 months ago
YueFan1014 / VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆181Updated 3 months ago
JUNJIE99 / MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆186Updated this week
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆569Updated 5 months ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆324Updated last month
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆369Updated this week
Leon1207 / Video-RAG-master
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
☆157Updated last month
tsb0601 / MMVP
☆319Updated last year
rese1f / MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆602Updated last month
magic-research / PLLaVA
Official repository for the paper PLLaVA
☆643Updated 7 months ago
OpenGVLab / OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆329Updated this week
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
☆425Updated last week
Vision-CAIR / LongVU
☆365Updated 3 weeks ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆289Updated last month
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆278Updated last month
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆135Updated 2 weeks ago
showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆415Updated last week
Vision-CAIR / MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
☆603Updated 3 months ago
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆285Updated 8 months ago
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆427Updated 3 months ago
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆256Updated last year
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆253Updated this week
pkunlp-icler / FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆384Updated 2 months ago
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆313Updated 3 weeks ago
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
☆209Updated 8 months ago