SHI-Labs/Slow-Fast-Video-Multimodal-LLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SHI-Labs/Slow-Fast-Video-Multimodal-LLM)

SHI-Labs / Slow-Fast-Video-Multimodal-LLM

☆29

Alternatives and similar repositories for Slow-Fast-Video-Multimodal-LLM

Users that are interested in Slow-Fast-Video-Multimodal-LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
SalesforceAIResearch / strefer
View on GitHub
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
☆19Jun 2, 2026Updated last month
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
NVlabs / VideoITG
View on GitHub
[CVPR 2026 Highlight] VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
☆126Apr 17, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 5 months ago
liuting20 / SwimVG
View on GitHub
Transactions on Multimedia (TMM25)
☆21Apr 8, 2025Updated last year
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
bytedance / F-16
View on GitHub
F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…
☆40Jul 3, 2025Updated last year
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆36Feb 22, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zjucsq / PLA
View on GitHub
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Sep 17, 2023Updated 2 years ago
Event-AHU / Research-Pathways-for-Newcomers
View on GitHub
新生入学必读材料
☆15Jun 2, 2026Updated last month
yangjie-cv / WeThink
View on GitHub
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆36Jun 10, 2025Updated last year
hwjiang1510 / VQLoC
View on GitHub
(NeurIPS 2023) Open-set visual object query search & localization in long-form videos
☆26Feb 1, 2024Updated 2 years ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆56May 20, 2026Updated 2 months ago
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated last year
EdenGabriel / TaskWeave
View on GitHub
[CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
☆30Sep 26, 2024Updated last year
pufanyi / syphus
View on GitHub
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Dec 14, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
microsoft / VLM-Video-Action-Localization
View on GitHub
☆26Jun 6, 2025Updated last year
thearkaprava / MS-Temba
View on GitHub
[CVPR 2026] Official Repository of 'MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos'
☆48Jun 22, 2026Updated last month
patrick-0817 / T-MASS-dataleakage
View on GitHub
☆10Nov 27, 2024Updated last year
TIGER-AI-Lab / Vamba
View on GitHub
Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]
☆107Jul 28, 2025Updated 11 months ago
luo-ziyuan / NeRF_Signature
View on GitHub
Source code of the paper "The NeRF Signature: Codebook-Aided Watermarking for Neural Radiance Fields".
☆18Mar 3, 2025Updated last year
marinero4972 / VideoZeroBench
View on GitHub
Official implementation of "VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification"
☆21May 7, 2026Updated 2 months ago
allenai / SAGE
View on GitHub
[arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
☆70Dec 17, 2025Updated 7 months ago
kumuji / Sa2VA-i
View on GitHub
Sa2VA-i is an improved version of the popular Sa2VA model
☆16Nov 25, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
snap-research / VIMI
View on GitHub
☆13Jul 10, 2024Updated 2 years ago
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆348Feb 8, 2026Updated 5 months ago
EIT-NLP / Speak-While-Watching
View on GitHub
☆17Mar 1, 2026Updated 4 months ago
Time-Search / TimeSearch-R
View on GitHub
[ICLR 2026] Official code for paper: TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinf…
☆27Jan 29, 2026Updated 5 months ago
cg1177 / Recursive-Multimodal-Agent
View on GitHub
☆19Jul 1, 2026Updated 3 weeks ago
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year