VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
β55Mar 9, 2025Updated last year
Alternatives and similar repositories for VideoNIAH
Users that are interested in VideoNIAH are comparing it to the libraries listed below
Sorting:
- π₯π₯MLVU: Multi-task Long Video Understanding Benchmarkβ242Aug 21, 2025Updated 7 months ago
- Official code of *Towards Event-oriented Long Video Understanding*β12Jul 26, 2024Updated last year
- β37Nov 8, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β115Jul 27, 2024Updated last year
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"β12Mar 1, 2025Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β130Apr 4, 2025Updated 11 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]β21Feb 27, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.β27Nov 18, 2025Updated 4 months ago
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmarkβ141Jul 9, 2025Updated 8 months ago
- β18Jul 10, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understandingβ31Apr 23, 2025Updated 10 months ago
- Long Context Transfer from Language to Visionβ402Mar 18, 2025Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability ofβ¦β123Nov 25, 2024Updated last year
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"β16Apr 22, 2024Updated last year
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β154Sep 10, 2024Updated last year
- β109Dec 30, 2024Updated last year
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ52Jul 11, 2025Updated 8 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoningβ36Jul 15, 2025Updated 8 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Taskβ36Apr 14, 2025Updated 11 months ago
- π₯π₯First-ever hour scale video understanding modelsβ616Jul 14, 2025Updated 8 months ago
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β71Jan 13, 2026Updated 2 months ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgentβ42Nov 30, 2025Updated 3 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β54Sep 4, 2023Updated 2 years ago
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ732Dec 8, 2025Updated 3 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β203Jun 18, 2025Updated 9 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated last month
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)β28Feb 16, 2024Updated 2 years ago
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videosβ27Apr 8, 2025Updated 11 months ago
- Comprehensive benchmark for video text understandingβ28Jun 4, 2025Updated 9 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ213Jan 6, 2025Updated last year
- β48Oct 20, 2025Updated 5 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Aug 30, 2025Updated 6 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β90Jul 13, 2025Updated 8 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ69Jun 9, 2024Updated last year
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β382Feb 23, 2025Updated last year
- β157Oct 31, 2024Updated last year
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Modelsβ48Oct 30, 2025Updated 4 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ281Jun 25, 2024Updated last year