Vision-CAIR / InfiniBenchLinks

Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows

☆19

Alternatives and similar repositories for InfiniBench

Users that are interested in InfiniBench are comparing it to the libraries listed below

Sorting:

mlvlab / DeepVideoR1
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆31Updated 2 months ago
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆42Updated 10 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆34Updated 11 months ago
OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆51Updated 7 months ago
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Updated 11 months ago
Andy-Cheng / TEMPURA
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆25Updated 8 months ago
inclusionAI / M2-Reasoning
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆46Updated 6 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆86Updated 6 months ago
TencentARC / GRPO-CARE
☆81Updated 7 months ago
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆28Updated last year
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆32Updated 10 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Updated 6 months ago
locuslab / llava-token-compression
☆46Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆77Updated last year
Share14 / ShareGemini
☆32Updated last year
marinero4972 / CyberV
☆18Updated 7 months ago
Aurora-slz / MM-Verify
☆18Updated 3 months ago
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆42Updated 8 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆70Updated last year
yliu-cs / PiTe
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Updated 11 months ago
HongbangYuan / OmniReward
☆39Updated last month
yunlong10 / CAT-V
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆63Updated last week
IVUL-KAUST / VideoAuto-R1
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆61Updated 3 weeks ago
jiyt17 / IDA-VLM
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Updated last year
path2generalist / General-Level
On Path to Multimodal Generalist: General-Level and General-Bench
☆19Updated 6 months ago
FatemehShiri / Spatial-MM
☆12Updated last year
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 6 months ago
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆52Updated 6 months ago
Yu-xm / ReVision
Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
☆24Updated this week
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆41Updated 11 months ago