MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆542Updated this week
Alternatives and similar repositories for Video-MME:
Users that are interested in Video-MME are comparing it to the libraries listed below
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆489Updated last week
- Long Context Transfer from Language to Vision☆374Updated last month
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆217Updated 7 months ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding☆364Updated 5 months ago
- ☆369Updated 2 months ago
- ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models☆634Updated 4 months ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆272Updated 10 months ago
- This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…☆540Updated 3 weeks ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆419Updated 4 months ago
- 🔥🔥MLVU: Multi-task Long Video Understanding Benchmark☆196Updated last month
- Explore the Multimodal “Aha Moment” on 2B Model☆583Updated last month
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆590Updated this week
- ☆328Updated last year
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆535Updated last month
- VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling☆402Updated this week
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆331Updated 2 months ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆334Updated 8 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆218Updated 10 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆438Updated 5 months ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆358Updated 2 months ago
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding☆614Updated 3 months ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆257Updated last year
- Efficient Multimodal Large Language Models: A Survey☆343Updated last week
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆422Updated 3 months ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆542Updated 2 weeks ago
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆361Updated 2 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆309Updated 2 weeks ago
- Official repository for the paper PLLaVA☆649Updated 9 months ago
- Awesome papers & datasets specifically focused on long-term videos.☆270Updated 5 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆578Updated 7 months ago