VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆56Mar 9, 2025Updated last year
Alternatives and similar repositories for VideoNIAH
Users that are interested in VideoNIAH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🔥🔥MLVU: Multi-task Long Video Understanding Benchmark☆249Apr 13, 2026Updated 2 weeks ago
- Official code of *Towards Event-oriented Long Video Understanding*☆12Jul 26, 2024Updated last year
- ☆37Nov 8, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆122Jul 27, 2024Updated last year
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"☆12Mar 1, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆131Apr 4, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆27Nov 18, 2025Updated 5 months ago
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark☆144Jul 9, 2025Updated 9 months ago
- ☆18Jul 10, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understanding☆31Apr 23, 2025Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆125Nov 25, 2024Updated last year
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"☆16Apr 22, 2024Updated 2 years ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆154Sep 10, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆112Dec 30, 2024Updated last year
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆54Jul 11, 2025Updated 9 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 9 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Apr 14, 2025Updated last year
- 🔥🔥First-ever hour scale video understanding models☆621Jul 14, 2025Updated 9 months ago
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"☆74Jan 13, 2026Updated 3 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆55Sep 4, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent☆45Nov 30, 2025Updated 5 months ago
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆759Dec 8, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆203Jun 18, 2025Updated 10 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆47Feb 19, 2026Updated 2 months ago
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)☆28Feb 16, 2024Updated 2 years ago
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos☆27Apr 8, 2025Updated last year
- Comprehensive benchmark for video text understanding☆28Jun 4, 2025Updated 10 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆65Aug 30, 2025Updated 8 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆384Feb 23, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆92Jul 13, 2025Updated 9 months ago
- ☆52Oct 20, 2025Updated 6 months ago
- ☆157Oct 31, 2024Updated last year
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆52Apr 16, 2026Updated 2 weeks ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆282Jun 25, 2024Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year