VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
β57Mar 9, 2025Updated last year
Alternatives and similar repositories for VideoNIAH
Users that are interested in VideoNIAH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π₯π₯MLVU: Multi-task Long Video Understanding Benchmarkβ255Apr 13, 2026Updated last month
- Official code of *Towards Event-oriented Long Video Understanding*β12Jul 26, 2024Updated last year
- β37Nov 8, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β125Jul 27, 2024Updated last year
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"β12Mar 1, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β131Apr 4, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]β21Feb 27, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.β27Nov 18, 2025Updated 6 months ago
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmarkβ144Jul 9, 2025Updated 10 months ago
- β18Jul 10, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understandingβ31Apr 23, 2025Updated last year
- Long Context Transfer from Language to Visionβ403Mar 18, 2025Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability ofβ¦β125Nov 25, 2024Updated last year
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"β16Apr 22, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β155Sep 10, 2024Updated last year
- β111Dec 30, 2024Updated last year
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ53Jul 11, 2025Updated 10 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoningβ37May 9, 2026Updated last week
- V1: Toward Multimodal Reasoning by Designing Auxiliary Taskβ36Apr 14, 2025Updated last year
- π₯π₯First-ever hour scale video understanding modelsβ622Jul 14, 2025Updated 10 months ago
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β74Jan 13, 2026Updated 4 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β55Sep 4, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgentβ45Nov 30, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ768Dec 8, 2025Updated 5 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β205Jun 18, 2025Updated 11 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated 3 months ago
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)β28Feb 16, 2024Updated 2 years ago
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videosβ27Apr 8, 2025Updated last year
- Comprehensive benchmark for video text understandingβ28Jun 4, 2025Updated 11 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ213Jan 6, 2025Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Aug 30, 2025Updated 8 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ69Jun 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β384Feb 23, 2025Updated last year
- β52Oct 20, 2025Updated 7 months ago
- β157Oct 31, 2024Updated last year
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ281Jun 25, 2024Updated last year
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β95Jul 13, 2025Updated 10 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMsβ59Apr 16, 2026Updated last month