Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆361May 24, 2026Updated 2 weeks ago
Alternatives and similar repositories for Video-MME-v2
Users that are interested in Video-MME-v2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆779Dec 8, 2025Updated 6 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆65Mar 16, 2026Updated 2 months ago
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- Collection of papers about video-audio understanding☆25Dec 26, 2025Updated 5 months ago
- [ACL2026] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark☆25Apr 13, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation☆51Apr 9, 2026Updated 2 months ago
- [CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆82Mar 25, 2026Updated 2 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆35Jul 30, 2025Updated 10 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- The Source Code for OmniVideoBench @ICLR 2026☆73Feb 12, 2026Updated 4 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆55May 20, 2026Updated 3 weeks ago
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆142Apr 27, 2026Updated last month
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆75Apr 7, 2025Updated last year
- Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos☆17Mar 16, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This is the project for 'USG'.☆39Apr 7, 2025Updated last year
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆71Sep 5, 2025Updated 9 months ago
- Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation☆48Dec 11, 2024Updated last year
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆112Jul 9, 2025Updated 11 months ago
- ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models☆650Dec 23, 2024Updated last year
- [CVPR'26] VisPlay: Self-Evolving Vision-Language Models☆57Feb 25, 2026Updated 3 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆130Jul 27, 2024Updated last year
- [ICML 2026] a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆83May 26, 2026Updated 2 weeks ago
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,513Mar 28, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML'24 Oral] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning wi…☆32Jun 21, 2024Updated last year
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆178Mar 23, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆18Apr 2, 2025Updated last year
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆91Apr 20, 2026Updated last month
- A Fine-grained Benchmark for Video Captioning and Retrieval☆30Jul 16, 2025Updated 10 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆11Jun 11, 2024Updated 2 years ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆136Aug 5, 2025Updated 10 months ago
- ☆13Oct 8, 2021Updated 4 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A real-time video understanding foundation model with gated cross-attention. Offline & real-time inference.☆138Jun 1, 2026Updated last week
- Scaling Agentic Environments Automatically.☆64Mar 26, 2026Updated 2 months ago
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆53Oct 9, 2025Updated 8 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated 2 years ago
- 🔥🔥MLVU: Multi-task Long Video Understanding Benchmark☆261Apr 13, 2026Updated last month
- Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…☆16Aug 6, 2024Updated last year
- A curated list of papers and resources for text-to-image evaluation.☆30Sep 6, 2023Updated 2 years ago