Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆356Apr 14, 2026Updated 2 weeks ago
Alternatives and similar repositories for Video-MME-v2
Users that are interested in Video-MME-v2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 11 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆57Mar 16, 2026Updated last month
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- Collection of papers about video-audio understanding☆25Dec 26, 2025Updated 4 months ago
- [ACL2026] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark☆24Apr 13, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation☆51Apr 9, 2026Updated 2 weeks ago
- [CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆73Mar 25, 2026Updated last month
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆35Jul 30, 2025Updated 8 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- The Source Code for OmniVideoBench @ICLR 2026☆72Feb 12, 2026Updated 2 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆51Mar 20, 2026Updated last month
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆129Updated this week
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆79Apr 7, 2025Updated last year
- ☆32Jul 29, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- This is the project for 'USG'.☆38Apr 7, 2025Updated last year
- The official implementation of Bayesian Cross-modal Alignment Learning for Few-Shot Out-of-Distribution Generalization (AAAI2023).☆12Oct 13, 2025Updated 6 months ago
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆70Sep 5, 2025Updated 7 months ago
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆83Feb 27, 2026Updated 2 months ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆55Mar 24, 2026Updated last month
- Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation☆48Dec 11, 2024Updated last year
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆112Jul 9, 2025Updated 9 months ago
- ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models☆647Dec 23, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆122Jul 27, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆80Mar 31, 2026Updated last month
- ☆29Jun 17, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,508Mar 28, 2025Updated last year
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆74Apr 20, 2026Updated last week
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆172Mar 23, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- A Fine-grained Benchmark for Video Captioning and Retrieval☆28Jul 16, 2025Updated 9 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆11Jun 11, 2024Updated last year
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆138Aug 5, 2025Updated 8 months ago
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- ☆13Oct 8, 2021Updated 4 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆46Apr 1, 2026Updated 3 weeks ago
- ☆10Jul 28, 2022Updated 3 years ago
- Scaling Agentic Environments Automatically.☆62Mar 26, 2026Updated last month
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆50Oct 9, 2025Updated 6 months ago