[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
☆68Jun 25, 2025Updated 10 months ago
Alternatives and similar repositories for StreamMind
Users that are interested in StreamMind are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)☆659Nov 26, 2025Updated 5 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆106Mar 14, 2025Updated last year
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆174Mar 23, 2025Updated last year
- [CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant☆30Dec 2, 2025Updated 5 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆142Jul 24, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆19Aug 7, 2025Updated 9 months ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆446Oct 29, 2025Updated 6 months ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆44Feb 5, 2025Updated last year
- Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundary☆17Jul 17, 2022Updated 3 years ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆43Mar 2, 2026Updated 2 months ago
- [CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online☆94Oct 7, 2025Updated 7 months ago
- [ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda H…☆22Updated this week
- The website of the Objects365 Dataset☆13Jun 15, 2023Updated 2 years ago
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"☆279Oct 15, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆16May 30, 2025Updated 11 months ago
- StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding☆156May 16, 2025Updated last year
- ☆53Feb 25, 2026Updated 2 months ago
- The implementation of RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization☆21May 26, 2025Updated 11 months ago
- Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach☆13Apr 2, 2025Updated last year
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]☆84May 8, 2026Updated last week
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistant☆424Mar 19, 2025Updated last year
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated 2 years ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆50Jul 1, 2025Updated 10 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)☆23Jul 16, 2025Updated 10 months ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆84Jul 4, 2025Updated 10 months ago
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆47Nov 20, 2025Updated 6 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- Repository of Streaming LLMs☆60May 13, 2026Updated last week
- Learning Large-scale Neural Fields via Context Pruned Meta-Learning (NeurIPS 2023)☆28Sep 24, 2023Updated 2 years ago
- We are very happy that our work has been accepted by ACM Multimedia 2024!🥰☆11Jan 8, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- [2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding☆31Aug 5, 2023Updated 2 years ago
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- Official code for the paper "Self-Distillation for Few-Shot Image Captioning"☆18Mar 15, 2021Updated 5 years ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆89Nov 27, 2025Updated 5 months ago
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement [ACL 2026 Findings]"☆50Apr 7, 2026Updated last month
- ☆18Oct 28, 2025Updated 6 months ago