๐ A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for streaming video.
โ146Apr 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for Awesome-VLM-Streaming-Video
Users that are interested in Awesome-VLM-Streaming-Video are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026] FOCUS: Efficient Keyframe Selection for Long Video Understandingโ64Updated this week
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)โ23Aug 1, 2025Updated 8 months ago
- [CVPR 2026 Highlight] VideoITG: Multimodal Video Understanding with Instructed Temporal Groundingโ116Apr 17, 2026Updated last week
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"โ43Oct 9, 2025Updated 6 months ago
- Awesome papers for affective computing with llm and mllmโ22Nov 26, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean โข AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [๐๐๐ญ๐ฎ๐ซ๐ ๐๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐๐๐ญ๐ข๐จ๐ง๐ฌ] ๐ค๐ก LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Cโฆโ25Apr 21, 2026Updated last week
- ๅฝ็งๅคง้ๆ ๆนๆ กๅบ2024~2025ๅนด่ฏพ็จ่ตๆ๏ผๅ ๆฌๅผบๅๅญฆไน ใๆบ่ฝ่ฎก็ฎ็ณป็ปใๆจกๅผ่ฏๅซใ็ฉ้ตๅๆไธๅบ็จใไบบๅทฅๆบ่ฝๅ็ไธ็ฎๆณใ่ช็ถ่ฏญ่จๅค็โ41Sep 22, 2025Updated 7 months ago
- [CVPR2022] Official Implementation of the paper 'Learning Where to Learn in Cross-View Self-Supervised Learning'โ29Oct 12, 2022Updated 3 years ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosโ35May 27, 2025Updated 11 months ago
- โ18Apr 4, 2025Updated last year
- Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentationโ19Nov 28, 2022Updated 3 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"โ25Feb 2, 2025Updated last year
- โ16Jan 6, 2025Updated last year
- [CVPR2025] "AniMo: Species-Aware Model for Text-Driven Animal Motion Generation"โ46Oct 8, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer โข AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.โ27Oct 19, 2025Updated 6 months ago
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semanticsโ37Sep 10, 2025Updated 7 months ago
- โ11Oct 4, 2023Updated 2 years ago
- Official code repository for the paper A Large-scale AI-generated Image Inpainting Benchmarkโ16Jan 13, 2026Updated 3 months ago
- [NeurIPS 2025] Deep Memory Backtracking for Long Video Understandingโ66Feb 10, 2026Updated 2 months ago
- Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training foโฆโ20Oct 20, 2025Updated 6 months ago
- โ27Apr 25, 2022Updated 4 years ago
- โ49Apr 7, 2026Updated 3 weeks ago
- [CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Mapsโ12Mar 26, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer โข AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- code for downloading videos from HowTo100M datasetโ17May 13, 2021Updated 4 years ago
- โ19Jun 10, 2025Updated 10 months ago
- Gifts for landscape photographers. Help the photographer seeking for meteors in the photo sequence.โ13Jun 21, 2022Updated 3 years ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streamsโ90Mar 15, 2026Updated last month
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learningโ29Jan 14, 2026Updated 3 months ago
- โ13Apr 23, 2025Updated last year
- [ACL2026] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmarkโ24Apr 13, 2026Updated 2 weeks ago
- โ15Feb 24, 2022Updated 4 years ago
- Official repository for โReasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Spaceโโ18Jan 27, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer โข AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detectionโ31Mar 19, 2025Updated last year
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decodingโ22Mar 2, 2025Updated last year
- ๐ A collection of resources and papers on Large Language Models in autonomous drivingโ27Oct 30, 2023Updated 2 years ago
- Pytorch implementation of the TPAMI paper of "HiGCIN: Hierarchical Graph-based Cross Inference Network for Group Activity Recognition"โ18Dec 10, 2020Updated 5 years ago
- SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesisโ36Jun 13, 2025Updated 10 months ago
- Datasets list for various computer vision tasksโ16Sep 7, 2019Updated 6 years ago
- โ46Nov 1, 2025Updated 5 months ago