VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
β57Mar 9, 2025Updated last year
Alternatives and similar repositories for VideoNIAH
Users that are interested in VideoNIAH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π₯π₯MLVU: Multi-task Long Video Understanding Benchmarkβ262Apr 13, 2026Updated 2 months ago
- Official code of *Towards Event-oriented Long Video Understanding*β12Jul 26, 2024Updated last year
- β40Nov 8, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β131Jul 27, 2024Updated last year
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"β12Mar 1, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β132Apr 4, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]β21Feb 27, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.β28Nov 18, 2025Updated 7 months ago
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmarkβ144Jul 9, 2025Updated 11 months ago
- β18Jul 10, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understandingβ31Jun 18, 2026Updated last week
- Long Context Transfer from Language to Visionβ408Mar 18, 2025Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability ofβ¦β126Nov 25, 2024Updated last year
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"β16Apr 22, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β154Sep 10, 2024Updated last year
- β117Dec 30, 2024Updated last year
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ54Jul 11, 2025Updated 11 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoningβ37May 9, 2026Updated last month
- V1: Toward Multimodal Reasoning by Designing Auxiliary Taskβ36Apr 14, 2025Updated last year
- π₯π₯First-ever hour scale video understanding modelsβ626Jul 14, 2025Updated 11 months ago
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β76Jan 13, 2026Updated 5 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β55Sep 4, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgentβ48Nov 30, 2025Updated 7 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ780Dec 8, 2025Updated 6 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β205Jun 18, 2025Updated last year
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated 4 months ago
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)β28Feb 16, 2024Updated 2 years ago
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videosβ27Apr 8, 2025Updated last year
- Comprehensive benchmark for video text understandingβ29Jun 4, 2025Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ211Jan 6, 2025Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Aug 30, 2025Updated 10 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ69Jun 9, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β383Feb 23, 2025Updated last year
- β52Oct 20, 2025Updated 8 months ago
- β158Oct 31, 2024Updated last year
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ281Jun 25, 2024Updated 2 years ago
- [ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β94Jul 13, 2025Updated 11 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year