mit-han-lab / streaming-vlm
View external linksLinks

StreamingVLM: Real-Time Understanding for Infinite Video Streams

☆872

Alternatives and similar repositories for streaming-vlm

Users that are interested in streaming-vlm are comparing it to the libraries listed below

Sorting:

showlab / livecc
View on GitHub
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
☆418Oct 29, 2025Updated 3 months ago
worv-ai / D2E
View on GitHub
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
☆68Jan 15, 2026Updated 3 weeks ago
tsinghua-ideal / Twilight
View on GitHub
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆87Nov 29, 2025Updated 2 months ago
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆99Nov 4, 2025Updated 3 months ago
franciszzj / Saber
View on GitHub
Scaling Zero-Shot Reference-to-Video Generation
☆63Dec 11, 2025Updated 2 months ago
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆269Oct 15, 2025Updated 3 months ago
icq-benchmark / icq-benchmark
View on GitHub
☆20Jul 28, 2025Updated 6 months ago
svg-project / Sparse-VideoGen
View on GitHub
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
☆627Feb 3, 2026Updated last week
felixtaubner / mvp4d
View on GitHub
Official repository for the paper "MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars"
☆41Nov 20, 2025Updated 2 months ago
WikiChao / ScalingConcept
View on GitHub
☆24Nov 1, 2024Updated last year
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆81Oct 15, 2025Updated 3 months ago
mbzuai-oryx / Agent-X
View on GitHub
Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆36Nov 27, 2025Updated 2 months ago
hao-ai-lab / FastVideo
View on GitHub
A unified inference and post-training framework for accelerated video generation.
☆3,059Updated this week
SillyTavern / Extension-WebLLM
View on GitHub
Provides an interface for extensions to use language models directly in the browser.
☆15Updated this week
caskcsg / longcontext
View on GitHub
Long Context Research
☆26Jan 26, 2026Updated 2 weeks ago
JoeLeelyf / OVO-Bench
View on GitHub
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆120Jul 24, 2025Updated 6 months ago
Sid2697 / HOI-Ref
View on GitHub
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
☆29Apr 16, 2024Updated last year
microsoft / ConstrainedReasoner
View on GitHub
☆13Aug 26, 2024Updated last year
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"
☆57Jan 23, 2026Updated 3 weeks ago
eric-zqwang / CLiFT
View on GitHub
Code for paper "CLiFT: Compressive Light-Field Tokens for Compute Efficient and Adaptive Neural Rendering" [NeurIPS 2025 (spotlight)]
☆75Aug 2, 2025Updated 6 months ago
bigai-nlco / VideoLLaMB
View on GitHub
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆83Feb 27, 2025Updated 11 months ago
NVlabs / LongLive
View on GitHub
[ICLR 2026] LongLive: Real-time Interactive Long Video Generation
☆1,040Jan 27, 2026Updated 2 weeks ago
mit-han-lab / fouroversix
View on GitHub
Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”
☆122Updated this week
joslefaure / HERMES
View on GitHub
[ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
☆38Sep 10, 2025Updated 5 months ago
microsoft / Magma
View on GitHub
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,895Jan 22, 2026Updated 3 weeks ago
Yui010206 / CREMA
View on GitHub
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆55Jul 1, 2025Updated 7 months ago
EvolvingLMMs-Lab / EgoLife
View on GitHub
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆390Mar 19, 2025Updated 10 months ago
boheumd / MA-LMM
View on GitHub
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆345Jul 19, 2024Updated last year
mu-cai / TemporalBench
View on GitHub
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Nov 10, 2024Updated last year
TingtingLiao / mimix
View on GitHub
☆82Oct 13, 2025Updated 4 months ago
perceivelab / trf-sg2im
View on GitHub
Official PyTorch implementation of the paper Transformer-Based Image Generation from Scene Graphs https://arxiv.org/abs/2303.04634
☆19Jan 30, 2024Updated 2 years ago
UT-Austin-RPL / sirius-fleet
View on GitHub
Sirius-Fleet: Multi-Task Interactive Robot Fleet Learning with Visual World Models
☆17Mar 12, 2025Updated 11 months ago
Wiselnn570 / VideoRoPE
View on GitHub
[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++
☆216Feb 2, 2026Updated last week
facebookresearch / EdgeTAM
View on GitHub
[CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"
☆871Jan 27, 2026Updated 2 weeks ago
NVlabs / VILA
View on GitHub
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,737Nov 28, 2025Updated 2 months ago
NVlabs / Fast-dLLM
View on GitHub
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆833Jan 28, 2026Updated 2 weeks ago
WangWenhao0716 / VidProM
View on GitHub
[NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
☆176Sep 26, 2024Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,552Sep 14, 2025Updated 5 months ago
time-to-move / TTM
View on GitHub
Official Pytorch Implementation for "Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising"
☆336Updated this week

mit-han-lab / streaming-vlmView external linksLinks

Alternatives and similar repositories for streaming-vlm

mit-han-lab / streaming-vlm
View external linksLinks