xinding-sys/StreamMind

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xinding-sys/StreamMind)

xinding-sys / StreamMind

[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

☆69

Alternatives and similar repositories for StreamMind

Users that are interested in StreamMind are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

showlab / videollm-online
View on GitHub
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
☆669Nov 26, 2025Updated 6 months ago
hmxiong / StreamChat
View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆108Mar 14, 2025Updated last year
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆178Mar 23, 2025Updated last year
iLearn-Lab / CVPR25-LION-FS
View on GitHub
[CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
☆30Dec 2, 2025Updated 6 months ago
JoeLeelyf / OVO-Bench
View on GitHub
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆143Jul 24, 2025Updated 10 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
zoezheng126 / Spatio-Temporal-LLM
View on GitHub
☆19Aug 7, 2025Updated 10 months ago
yellow-binary-tree / MMDuet
View on GitHub
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…
☆43Feb 5, 2025Updated last year
Raymond-sci / EMB
View on GitHub
Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundary
☆17Jul 17, 2022Updated 3 years ago
V-STaR-Bench / V-STaR
View on GitHub
Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
☆43Mar 2, 2026Updated 3 months ago
MCG-NJU / VideoChat-Online
View on GitHub
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆94Oct 7, 2025Updated 8 months ago
Jyxarthur / shot-by-shot
View on GitHub
[ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda H…
☆21May 16, 2026Updated 3 weeks ago
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆281Oct 15, 2025Updated 7 months ago
qishisuren123 / AnyCap
View on GitHub
A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…
☆54Jul 24, 2025Updated 10 months ago
FrankYang-17 / Mavors
View on GitHub
☆16May 30, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
THUNLP-MT / StreamingBench
View on GitHub
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
☆160May 16, 2025Updated last year
stepfun-ai / GEBench
View on GitHub
☆54Feb 25, 2026Updated 3 months ago
EachSheep / RAGSynth
View on GitHub
The implementation of RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization
☆21May 26, 2025Updated last year
ali-vilab / CDT
View on GitHub
Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
☆15Apr 2, 2025Updated last year
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆86May 8, 2026Updated last month
Sid2697 / HOI-Ref
View on GitHub
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
☆29Apr 16, 2024Updated 2 years ago
EvolvingLMMs-Lab / EgoLife
View on GitHub
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆435Mar 19, 2025Updated last year
yu-lin-li / DyToK
View on GitHub
[NeurIPS 2025] Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
☆79Feb 20, 2026Updated 3 months ago
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆31Apr 19, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Anonymous-012 / SVDP
View on GitHub
[AAAI 2024] SVDP: Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
☆33Apr 26, 2024Updated 2 years ago
HumanMLLM / ViSpeak
View on GitHub
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆52Jul 1, 2025Updated 11 months ago
zihuixue / ProgCaptioner
View on GitHub
Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)
☆25Jul 16, 2025Updated 10 months ago
Guanzhou-Ke / Knowledge-Bridger
View on GitHub
The official repos of "Knowledge Bridger: Towards Training-Free Missing Modality Completion"
☆21Jun 30, 2025Updated 11 months ago
appletea233 / LLaVA-ST
View on GitHub
[CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
☆83Jul 4, 2025Updated 11 months ago
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
zhyang2226 / DMBP
View on GitHub
[ICLR 2024] DMBP: Diffusion Model-Based Predictor for Robust Offline Reinforcement Learning against State Observations Perturbations.
☆18May 24, 2024Updated 2 years ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆44Mar 11, 2025Updated last year
IVY-LVLM / Video-MA2MBA
View on GitHub
Official Implementation of Video-MA2MBA
☆12Dec 3, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
EIT-NLP / StreamingLLM
View on GitHub
Repository of Streaming LLMs
☆71Jun 3, 2026Updated last week
k4rtik / uchicago-poster
View on GitHub
Unofficial Poster Template for UChicago Computer Science
☆14Sep 8, 2022Updated 3 years ago
HDUyiming / SOCCER
View on GitHub
We are very happy that our work has been accepted by ACM Multimedia 2024！🥰
☆12Jan 8, 2025Updated last year
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 3 months ago
Chuhanxx / helping_hand_for_egocentric_videos
View on GitHub
Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'
☆33Nov 7, 2023Updated 2 years ago
shuoyang129 / eamat
View on GitHub
Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)
☆12Oct 11, 2022Updated 3 years ago
chenxy99 / SD-FSIC
View on GitHub
Official code for the paper "Self-Distillation for Few-Shot Image Captioning"
☆18Mar 15, 2021Updated 5 years ago