HumanMLLM/ViSpeak

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HumanMLLM/ViSpeak)

HumanMLLM / ViSpeak

(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"

☆53

Alternatives and similar repositories for ViSpeak

Users that are interested in ViSpeak are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pro-assist / ProAssist
View on GitHub
☆20Jul 21, 2025Updated last year
hmxiong / StreamChat
View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆111Mar 14, 2025Updated last year
HumanMLLM / LOVE-R1
View on GitHub
Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"
☆24Nov 1, 2025Updated 8 months ago
OmniMMI / OmniMMI
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆23Jul 14, 2026Updated last week
JoeLeelyf / OVO-Bench
View on GitHub
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆154Jul 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apple / ml-streambridge
View on GitHub
☆40Nov 5, 2025Updated 8 months ago
daeunni / StreamGaze
View on GitHub
Code for "StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos"
☆26May 13, 2026Updated 2 months ago
yellow-binary-tree / MMDuet2
View on GitHub
[ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
☆41Jan 14, 2026Updated 6 months ago
yellow-binary-tree / MMDuet
View on GitHub
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…
☆45Feb 5, 2025Updated last year
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆180Mar 23, 2025Updated last year
yaolinli / TimeChat-Online
View on GitHub
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆132Jun 29, 2026Updated 3 weeks ago
air-embodied-brain / Em-Garde
View on GitHub
Implementation of Em_Garde: a proposal-retrieval framework for streaming video understanding
☆26Jun 24, 2026Updated last month
THUNLP-MT / StreamingBench
View on GitHub
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
☆167May 16, 2025Updated last year
daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 10 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
yellow-binary-tree / ProactiveVideoQA
View on GitHub
ProactiveBench: A Comprehensive Benchmark for VideoLLM Proactive Interaction Evaluation
☆18Jan 8, 2026Updated 6 months ago
xinding-bot / StreamMind
View on GitHub
[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
☆73Jun 25, 2025Updated last year
JPShi12 / VideoLoom
View on GitHub
[ICML 2026] VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
☆27Jul 3, 2026Updated 3 weeks ago
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆287Oct 15, 2025Updated 9 months ago
iSEE-Laboratory / Seg-ReSearch
View on GitHub
(ICML 2026) Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search
☆47May 1, 2026Updated 2 months ago
maifoundations / Streamo
View on GitHub
Streaming Video Instruction Tuning
☆79Feb 25, 2026Updated 4 months ago
iLearn-Lab / CVPR25-LION-FS
View on GitHub
[CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
☆29Dec 2, 2025Updated 7 months ago
MCG-NJU / StreamForest
View on GitHub
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆133Nov 4, 2025Updated 8 months ago
iSEE-Laboratory / HD-OVD
View on GitHub
(TMM 2025) Official repository of paper "A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection"
☆27Mar 14, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Yui010206 / MEXA
View on GitHub
[EMNLP 2025 Findings] MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
☆15Aug 22, 2025Updated 11 months ago
HumanMLLM / HumanOmniV2
View on GitHub
☆161Jul 31, 2025Updated 11 months ago
caojiaolong / Awesome-Mamba
View on GitHub
Collect papers about Mamba (a selective state space model).
☆15Aug 6, 2024Updated last year
AdaCheng / VidEgoThink
View on GitHub
The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"
☆18Mar 25, 2025Updated last year
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
EvolvingLMMs-Lab / SimpleStream
View on GitHub
A simple video streaming baseline that outperforms SOTAs.
☆151May 1, 2026Updated 2 months ago
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago
showlab / videollm-online
View on GitHub
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
☆677Nov 26, 2025Updated 7 months ago
Hongcheng-Gao / HAVEN
View on GitHub
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆25Oct 22, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HumanMLLM / HumanOmni
View on GitHub
HumanOmni
☆240Mar 10, 2025Updated last year
HumanMLLM / IRG-MotionLLM
View on GitHub
(ECCV2026) Official repository of paper "IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Gene…
☆30Jul 1, 2026Updated 3 weeks ago
mll-lab-nu / TStar
View on GitHub
TStar is a unified temporal search framework for long-form video question answering
☆97Mar 23, 2026Updated 4 months ago
lzyhha / HSSL
View on GitHub
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
☆15May 2, 2025Updated last year
iSEE-Laboratory / BPF
View on GitHub
(ECCV 2024) Official repository of paper "Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection"
☆20Mar 26, 2025Updated last year
MajorDavidZhang / Generalization_unified_VLM
View on GitHub
☆24May 23, 2025Updated last year
Adam-duan / DiffRetouch
View on GitHub
[AAAI2025] This is the official PyTorch codes for the paper: "DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts"
☆25Jun 16, 2025Updated last year