Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆368May 24, 2026Updated last month
Alternatives and similar repositories for Video-MME-v2
Users that are interested in Video-MME-v2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated last year
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆780Dec 8, 2025Updated 6 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆69Mar 16, 2026Updated 3 months ago
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- [ACL2026 oral] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark☆25Apr 13, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation☆54Apr 9, 2026Updated 2 months ago
- [CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆92Jun 18, 2026Updated 2 weeks ago
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆305May 14, 2025Updated last year
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆35Jul 30, 2025Updated 11 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- The Source Code for OmniVideoBench @ICLR 2026☆73Feb 12, 2026Updated 4 months ago
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆149Apr 27, 2026Updated 2 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆74Apr 7, 2025Updated last year
- Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos☆17Mar 16, 2026Updated 3 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆32Jul 29, 2024Updated last year
- This is the project for 'USG'.☆39Jun 21, 2026Updated last week
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆71Sep 5, 2025Updated 10 months ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆55Mar 24, 2026Updated 3 months ago
- Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation☆48Dec 11, 2024Updated last year
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆87Feb 27, 2026Updated 4 months ago
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆113Jul 9, 2025Updated 11 months ago
- ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models☆650Dec 23, 2024Updated last year
- [CVPR'26] VisPlay: Self-Evolving Vision-Language Models☆62Feb 25, 2026Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICML 2026] a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆89May 26, 2026Updated last month
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,520Mar 28, 2025Updated last year
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆178Mar 23, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆18Apr 2, 2025Updated last year
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆93Apr 20, 2026Updated 2 months ago
- A Fine-grained Benchmark for Video Captioning and Retrieval☆30Jul 16, 2025Updated 11 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆11Jun 11, 2024Updated 2 years ago
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆135Aug 5, 2025Updated 11 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- ☆13Oct 8, 2021Updated 4 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated last month
- Scaling Agentic Environments Automatically.☆66Mar 26, 2026Updated 3 months ago
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆55Oct 9, 2025Updated 8 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated 2 years ago
- 🔥🔥MLVU: Multi-task Long Video Understanding Benchmark☆262Apr 13, 2026Updated 2 months ago