Go2Heart/StreamFormer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Go2Heart/StreamFormer)

Go2Heart / StreamFormer

[ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.

☆93

Alternatives and similar repositories for StreamFormer

Users that are interested in StreamFormer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qirui-chen / RGA3-release
View on GitHub
[ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring
☆24Aug 8, 2025Updated 11 months ago
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆121Nov 4, 2025Updated 8 months ago
Go2Heart / OmniStream
View on GitHub
[ECCV 2026] OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
☆113Mar 15, 2026Updated 4 months ago
zhengrongz / AoTD
View on GitHub
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
☆58May 25, 2025Updated last year
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆56May 20, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
hmxiong / StreamChat
View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆111Mar 14, 2025Updated last year
Lzq5 / Video-Text-Alignment
View on GitHub
☆28Jul 18, 2025Updated last year
minghangz / OnVTG
View on GitHub
Online video temporal grounding
☆16Oct 20, 2025Updated 9 months ago
Go2Heart / EchoSight
View on GitHub
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
☆90Jan 19, 2026Updated 6 months ago
haoningwu3639 / SimpleSDM-Video
View on GitHub
A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.
☆20Feb 15, 2024Updated 2 years ago
haoningwu3639 / MRGen
View on GitHub
[ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
☆41Sep 26, 2025Updated 9 months ago
pangzhan27 / CMeRT
View on GitHub
☆18Jul 14, 2025Updated last year
Becomebright / MTV
View on GitHub
Revisiting Multi-Task Visual Representation Learning
☆22Jan 21, 2026Updated 6 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
pro-assist / ProAssist
View on GitHub
☆20Jul 21, 2025Updated last year
MCG-NJU / StreamForest
View on GitHub
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆131Nov 4, 2025Updated 8 months ago
lntzm / HICom
View on GitHub
[CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
☆21Apr 30, 2025Updated last year
haolinyang-hlyang / SoccerMaster
View on GitHub
[CVPR 2026 Oral] SoccerMaster: A Vision Foundation Model for Soccer Understanding
☆67Updated this week
Code-kunkun / ZS-CIR
View on GitHub
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆55Nov 26, 2024Updated last year
mit-han-lab / streaming-vlm
View on GitHub
StreamingVLM: Real-Time Understanding for Infinite Video Streams
☆1,046Oct 15, 2025Updated 9 months ago
cg1177 / Recursive-Multimodal-Agent
View on GitHub
☆19Jul 1, 2026Updated 2 weeks ago
jyrao / MatchTime
View on GitHub
[EMNLP 2024 Oral] MatchTime: Towards Automatic Soccer Game Commentary Generation
☆103Jan 2, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
haoningwu3639 / SimpleSDM-3
View on GitHub
A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.
☆27May 28, 2025Updated last year
haoningwu3639 / SpatialScore
View on GitHub
[CVPR 2026 Highlight] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
☆84May 28, 2026Updated last month
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆180Mar 23, 2025Updated last year
MAGIC-AI4Med / RaTEScore
View on GitHub
[EMNLP 2024] RaTEScore: A Metric for Radiology Report Generation
☆67May 18, 2025Updated last year
Koreyoshi01 / VISD
View on GitHub
This repository is the official implementation for VISD.
☆21May 17, 2026Updated 2 months ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆92May 8, 2026Updated 2 months ago
InvincibleWyq / ChatVID
View on GitHub
Chat about anything on any video!
☆39Sep 5, 2023Updated 2 years ago
Echo0125 / MAT-Memory-and-Anticipation-Transformer
View on GitHub
[ICCV 2023] Official implementation of Memory-and-Anticipation Transformer for Online Action Understanding
☆50Oct 7, 2023Updated 2 years ago
MAGIC-AI4Med / KEP
View on GitHub
[ECCV 2024 Oral] Knowledge-enhanced pretraining for computational pathology
☆50Apr 17, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
sail-sg / Video-Next-Event-Prediction
View on GitHub
☆28Aug 9, 2025Updated 11 months ago
AIM-SKKU / QA-TIGER
View on GitHub
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆29Jun 6, 2025Updated last year
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated 11 months ago
jyrao / SoccerAgent
View on GitHub
[ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"
☆82Oct 31, 2025Updated 8 months ago
sotayang / LiveStar
View on GitHub
[NeurIPS'2025] Official repository for "LiveStar: Live Streaming Assistant for Real-World Online Video Understanding"
☆154Jul 3, 2026Updated 2 weeks ago
marco-garosi / ComCa
View on GitHub
Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"
☆23Dec 23, 2024Updated last year
Espere-1119-Song / VideoNSA
View on GitHub
VideoNSA: Native Sparse Attention Scales Video Understanding
☆88Nov 16, 2025Updated 8 months ago