CASIA-IVA-Lab/VideoNIAH

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CASIA-IVA-Lab/VideoNIAH)

CASIA-IVA-Lab / VideoNIAH

VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs

☆57

Alternatives and similar repositories for VideoNIAH

Users that are interested in VideoNIAH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JUNJIE99 / MLVU
View on GitHub
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆255Apr 13, 2026Updated last month
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated last year
yale-nlp / TOMATO
View on GitHub
☆37Nov 8, 2024Updated last year
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆125Jul 27, 2024Updated last year
ivattyue / Ada-K
View on GitHub
Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"
☆12Mar 1, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆131Apr 4, 2025Updated last year
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆21Feb 27, 2025Updated last year
CASIA-IVA-Lab / VRoPE
View on GitHub
[EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.
☆27Nov 18, 2025Updated 6 months ago
zai-org / LVBench
View on GitHub
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆144Jul 9, 2025Updated 10 months ago
lscpku / VITATECS
View on GitHub
☆18Jul 10, 2024Updated last year
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Apr 23, 2025Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆403Mar 18, 2025Updated last year
OpenGVLab / MM-NIAH
View on GitHub
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆125Nov 25, 2024Updated last year
CASIA-IVA-Lab / SC-Tune
View on GitHub
Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"
☆16Apr 22, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
TencentARC / ST-LLM
View on GitHub
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆155Sep 10, 2024Updated last year
egoschema / EgoSchema
View on GitHub
☆111Dec 30, 2024Updated last year
DAMO-NLP-SG / CMM
View on GitHub
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆53Jul 11, 2025Updated 10 months ago
zhishuifeiqian / VCR-Bench
View on GitHub
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
☆37May 9, 2026Updated last week
haonan3 / V1
View on GitHub
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆36Apr 14, 2025Updated last year
VectorSpaceLab / Video-XL
View on GitHub
🔥🔥First-ever hour scale video understanding models
☆622Jul 14, 2025Updated 10 months ago
thu-nics / FrameFusion
View on GitHub
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆74Jan 13, 2026Updated 4 months ago
CASIA-IVA-Lab / ChatBridge
View on GitHub
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…
☆55Sep 4, 2023Updated 2 years ago
xiaoqian-shen / Vgent
View on GitHub
[NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent
☆45Nov 30, 2025Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆768Dec 8, 2025Updated 5 months ago
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated 11 months ago
Lizw14 / Super-CLEVR
View on GitHub
Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"
☆47Feb 19, 2026Updated 3 months ago
PeterGriffinJin / Heterformer
View on GitHub
Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)
☆28Feb 16, 2024Updated 2 years ago
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
shuyansy / VidText
View on GitHub
Comprehensive benchmark for video text understanding
☆28Jun 4, 2025Updated 11 months ago
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆213Jan 6, 2025Updated last year
xing0047 / cca-llava
View on GitHub
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
☆66Aug 30, 2025Updated 8 months ago
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Wang-Xiaodong1899 / Open-R1-Video
View on GitHub
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆384Feb 23, 2025Updated last year
Weiyun1025 / verl-internvl
View on GitHub
☆52Oct 20, 2025Updated 7 months ago
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆157Oct 31, 2024Updated last year
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated last year
TencentARC / Video-Holmes
View on GitHub
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated 10 months ago
The-Martyr / Awesome-Multimodal-Reasoning
View on GitHub
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMs
☆59Apr 16, 2026Updated last month