VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
β55Mar 9, 2025Updated last year
Alternatives and similar repositories for VideoNIAH
Users that are interested in VideoNIAH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π₯π₯MLVU: Multi-task Long Video Understanding Benchmarkβ243Aug 21, 2025Updated 7 months ago
- Official code of *Towards Event-oriented Long Video Understanding*β12Jul 26, 2024Updated last year
- β37Nov 8, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β118Jul 27, 2024Updated last year
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"β12Mar 1, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β131Apr 4, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]β21Feb 27, 2025Updated last year
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.β27Nov 18, 2025Updated 4 months ago
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmarkβ142Jul 9, 2025Updated 9 months ago
- β18Jul 10, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understandingβ31Apr 23, 2025Updated 11 months ago
- Long Context Transfer from Language to Visionβ403Mar 18, 2025Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability ofβ¦β124Nov 25, 2024Updated last year
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"β16Apr 22, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β154Sep 10, 2024Updated last year
- β110Dec 30, 2024Updated last year
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ52Jul 11, 2025Updated 8 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoningβ36Jul 15, 2025Updated 8 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Taskβ36Apr 14, 2025Updated 11 months ago
- π₯π₯First-ever hour scale video understanding modelsβ620Jul 14, 2025Updated 8 months ago
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β71Jan 13, 2026Updated 2 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β54Sep 4, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgentβ42Nov 30, 2025Updated 4 months ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ746Dec 8, 2025Updated 4 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β203Jun 18, 2025Updated 9 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated last month
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)β28Feb 16, 2024Updated 2 years ago
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videosβ27Apr 8, 2025Updated last year
- Comprehensive benchmark for video text understandingβ28Jun 4, 2025Updated 10 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ213Jan 6, 2025Updated last year
- β51Oct 20, 2025Updated 5 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Aug 30, 2025Updated 7 months ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β91Jul 13, 2025Updated 8 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ69Jun 9, 2024Updated last year
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β383Feb 23, 2025Updated last year
- β157Oct 31, 2024Updated last year
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Modelsβ50Oct 30, 2025Updated 5 months ago
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ281Jun 25, 2024Updated last year