OpenGVLab / VideoMambaLinks

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

☆1,028

Alternatives and similar repositories for VideoMamba

Users that are interested in VideoMamba are comparing it to the libraries listed below

Sorting:

OpenGVLab / video-mamba-suite
The suite of modeling video with Mamba
☆280Updated last year
OpenGVLab / VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
☆712Updated last year
kyegomez / VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Mod…
☆477Updated 2 weeks ago
NVlabs / MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
☆1,902Updated 4 months ago
SunzeY / AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆854Updated 4 months ago
OpenGVLab / Vision-RWKV
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
☆525Updated 9 months ago
OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆2,114Updated 3 months ago
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆1,041Updated last year
LeapLabTHU / Agent-Attention
Official repository of Agent Attention (ECCV2024)
☆648Updated last year
MCG-NJU / VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
☆1,622Updated last year
Ruixxxx / Awesome-Vision-Mamba-Models
[Official Repo] Visual Mamba: A Survey and New Outlooks
☆719Updated 9 months ago
gaomingqi / Awesome-Video-Object-Segmentation
🔥 Latest advances in Video Object Segmentation (VOS) – papers, datasets, and projects.
☆428Updated last week
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆930Updated 3 months ago
MzeroMiko / VMamba
VMamba: Visual State Space Models，code is based on mamba
☆2,926Updated 8 months ago
yyyujintang / Awesome-Mamba-Papers
Awesome Papers related to Mamba.
☆1,377Updated last year
OpenGVLab / VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆486Updated 2 weeks ago
Malitha123 / awesome-video-self-supervised-learning
A curated list of awesome self-supervised learning methods in videos
☆158Updated 3 weeks ago
OpenGVLab / UniFormerV2
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
☆336Updated last year
Event-AHU / Mamba_State_Space_Model_Paper_List
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
☆745Updated 5 months ago
hustvl / Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
☆3,686Updated 9 months ago
rese1f / MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆668Updated 10 months ago
LTH14 / rcg
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
☆935Updated last year
NVlabs / FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
☆890Updated 4 months ago
NX-AI / vision-lstm
xLSTM as Generic Vision Backbone
☆490Updated last month
OpenGVLab / unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
☆341Updated last year
NVlabs / RADIO
Official repository for "AM-RADIO: Reduce All Domains Into One"
☆1,403Updated last week
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆852Updated last year
facebookresearch / mae_st
Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"
☆356Updated last year
ytongbai / LVM
☆1,839Updated last year
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆343Updated last year