Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆350Apr 14, 2026Updated last month
Alternatives and similar repositories for Video-MME-v2
Users that are interested in Video-MME-v2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 11 months ago
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆768Dec 8, 2025Updated 5 months ago
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- Collection of papers about video-audio understanding☆25Dec 26, 2025Updated 4 months ago
- [ACL2026] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark☆25Apr 13, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation☆51Apr 9, 2026Updated last month
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆35Jul 30, 2025Updated 9 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- The Source Code for OmniVideoBench @ICLR 2026☆72Feb 12, 2026Updated 3 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆52Mar 20, 2026Updated 2 months ago
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆132Apr 27, 2026Updated 3 weeks ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆73Apr 7, 2025Updated last year
- ☆32Jul 29, 2024Updated last year
- This is the project for 'USG'.☆38Apr 7, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆70Sep 5, 2025Updated 8 months ago
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆85Feb 27, 2026Updated 2 months ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆55Mar 24, 2026Updated last month
- Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation☆49Dec 11, 2024Updated last year
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆112Jul 9, 2025Updated 10 months ago
- ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models☆648Dec 23, 2024Updated last year
- [CVPR'26] VisPlay: Self-Evolving Vision-Language Models☆57Feb 25, 2026Updated 2 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆125Jul 27, 2024Updated last year
- [ICML 2026] a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆80Mar 31, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆29Jun 17, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,512Mar 28, 2025Updated last year
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆81Apr 20, 2026Updated last month
- [ICML'24 Oral] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning wi…☆32Jun 21, 2024Updated last year
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆174Mar 23, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- A Fine-grained Benchmark for Video Captioning and Retrieval☆28Jul 16, 2025Updated 10 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆11Jun 11, 2024Updated last year
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆137Aug 5, 2025Updated 9 months ago
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated last week
- ☆10Jul 28, 2022Updated 3 years ago
- A real-time video understanding foundation model built on Llama-3.2-Vision, featuring comprehensively extended video processing and multi…☆138Apr 13, 2026Updated last month
- Scaling Agentic Environments Automatically.☆63Mar 26, 2026Updated last month
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated last year