VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
β54Mar 9, 2025Updated 11 months ago
Alternatives and similar repositories for VideoNIAH
Users that are interested in VideoNIAH are comparing it to the libraries listed below
Sorting:
- π₯π₯MLVU: Multi-task Long Video Understanding Benchmarkβ241Aug 21, 2025Updated 6 months ago
- β37Nov 8, 2024Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β129Apr 4, 2025Updated 10 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β113Jul 27, 2024Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]β21Feb 27, 2025Updated last year
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmarkβ137Jul 9, 2025Updated 7 months ago
- Official code of *Towards Event-oriented Long Video Understanding*β12Jul 26, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understandingβ30Apr 23, 2025Updated 10 months ago
- β18Jul 10, 2024Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoningβ35Jul 15, 2025Updated 7 months ago
- Long Context Transfer from Language to Visionβ402Mar 18, 2025Updated 11 months ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgentβ40Nov 30, 2025Updated 3 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability ofβ¦β123Nov 25, 2024Updated last year
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β151Sep 10, 2024Updated last year
- β109Dec 30, 2024Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Taskβ36Apr 14, 2025Updated 10 months ago
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ731Dec 8, 2025Updated 2 months ago
- π₯π₯First-ever hour scale video understanding modelsβ610Jul 14, 2025Updated 7 months ago
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ52Jul 11, 2025Updated 7 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''β107Aug 21, 2025Updated 6 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ281Jun 25, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"β11Oct 11, 2024Updated last year
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated last week
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β203Jun 18, 2025Updated 8 months ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"β48Sep 3, 2025Updated 5 months ago
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β381Feb 23, 2025Updated last year
- β360Jan 27, 2024Updated 2 years ago
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"β15Aug 27, 2025Updated 6 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ213Jan 6, 2025Updated last year
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversationβ47Sep 19, 2023Updated 2 years ago
- [ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGIβ116Jul 18, 2024Updated last year
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planningβ81Dec 6, 2024Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridgesβ83Feb 27, 2025Updated last year
- β155Oct 31, 2024Updated last year
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmarkβ138Jun 4, 2025Updated 8 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]β38Feb 1, 2026Updated last month
- [COLING 2025π₯] Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detectionβ17Jan 21, 2025Updated last year
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videosβ27Apr 8, 2025Updated 10 months ago
- A Massive Multi-Discipline Lecture Understanding Benchmarkβ32Nov 1, 2025Updated 4 months ago