showlab / MovieSeqLinks

[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences

☆40

Alternatives and similar repositories for MovieSeq

Users that are interested in MovieSeq are comparing it to the libraries listed below

Sorting:

whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆65Updated last year
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆25Updated 8 months ago
Hritikbansal / videocon
☆58Updated last year
rxtan2 / Koala-video-llm
☆36Updated last year
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Updated 2 years ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆72Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆51Updated 8 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 10 months ago
Share14 / ShareGemini
☆32Updated last year
jiyt17 / IDA-VLM
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆36Updated last year
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆78Updated 4 months ago
PKU-YuanGroup / LLMBind
LLMBind: A Unified Modality-Task Integration Framework
☆18Updated last year
TencentARC / GRPO-CARE
☆79Updated 5 months ago
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 3 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
DCDmllm / Momentor
☆80Updated last year
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆57Updated 2 years ago
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆58Updated 5 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 8 months ago
ChocoWu / SeTok
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆75Updated 7 months ago
xjtupanda / Sparrow
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Updated 3 months ago
dhg-wei / TOPA
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆31Updated last year
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆49Updated 5 months ago
showlab / cosmo
☆73Updated last year
ruili33 / TPO
☆39Updated 2 months ago
bigai-nlco / VideoLLaMB
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆78Updated 9 months ago
CeeZh / SILVR
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Updated 2 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated last year
V-STaR-Bench / V-STaR
Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
☆35Updated 4 months ago