showlab / videollm-onlineLinks

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

☆588

Alternatives and similar repositories for videollm-online

Users that are interested in videollm-online are comparing it to the libraries listed below

Sorting:

keshik6 / HourVideo
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
☆159Updated 4 months ago
yfzhang114 / Thyme
Think Beyond Images
☆516Updated 2 months ago
Ola-Omni / Ola
Ola: Pushing the Frontiers of Omni-Modal Language Model
☆378Updated 5 months ago
ShareGPT4Omni / ShareGPT4Video
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
☆1,079Updated last year
IVGSZ / Flash-VStream
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆247Updated last month
YueFan1014 / VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆272Updated 11 months ago
dvlab-research / VisionThink
[NeurIPS 2025] Efficient Reasoning Vision Language Models
☆415Updated 2 months ago
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆683Updated 3 months ago
FoundationVision / Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆580Updated last year
NVlabs / Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
☆898Updated 3 weeks ago
alibaba-damo-academy / PixelRefer
The code for PixelRefer & VideoRefer
☆320Updated last week
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆745Updated 2 months ago
Wiselnn570 / VideoRoPE
[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++
☆204Updated 3 months ago
OpenGVLab / VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆483Updated this week
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆227Updated last month
xiaoachen98 / Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
☆427Updated last year
apple / ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆283Updated last year
yfzhang114 / r1_reward
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆270Updated 6 months ago
RenShuhuai-Andy / TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆398Updated 6 months ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆500Updated 3 months ago
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆575Updated 4 months ago
ZiyuGuo99 / Image-Generation-CoT
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
☆829Updated 6 months ago
Oryx-mllm / Oryx
[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
☆328Updated 4 months ago
dvlab-research / Lyra
[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
☆301Updated 10 months ago
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆397Updated 8 months ago
Leon1207 / Video-RAG-master
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆343Updated 3 weeks ago
shenyunhang / APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆598Updated last year
Mark12Ding / Dispider
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆145Updated 8 months ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆373Updated 9 months ago
Hui-design / TSPO
[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
☆94Updated last week