MCG-NJU / VideoMAELinks

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

☆1,552

Alternatives and similar repositories for VideoMAE

Users that are interested in VideoMAE are comparing it to the libraries listed below

Sorting:

OpenGVLab / VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
☆669Updated 9 months ago
SwinTransformer / Video-Swin-Transformer
This is an official implementation for "Video Swin Transformers".
☆1,568Updated 2 years ago
facebookresearch / TimeSformer
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
☆1,732Updated last year
cvdfoundation / kinetics-dataset
☆874Updated last year
microsoft / VideoX
VideoX: a collection of video cross-modal models
☆1,036Updated last year
Sense-X / UniFormer
[ICLR2022] official implementation of UniFormer
☆876Updated last year
happyharrycn / actionformer_release
Code release for ActionFormer (ECCV 2022)
☆508Updated last year
rishikksh20 / ViViT-pytorch
Implementation of ViViT: A Video Vision Transformer
☆541Updated 4 years ago
OpenGVLab / VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
☆983Updated last year
OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆1,984Updated last month
facebookresearch / mvit
Code Release for MViTv2 on Image Recognition.
☆435Updated 8 months ago
mx-mark / VideoTransformer-pytorch
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
☆300Updated 3 years ago
sallymmx / ActionCLIP
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
☆566Updated last year
ArrowLuo / CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
☆974Updated last year
OpenGVLab / UniFormerV2
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
☆324Updated last year
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆1,008Updated last year
google-research / scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
☆3,618Updated 3 weeks ago
lucidrains / CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
☆1,162Updated last year
v-iashin / video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and T…
☆606Updated 6 months ago
haofanwang / video-swin-transformer-pytorch
Video Swin Transformer - PyTorch
☆260Updated 3 years ago
zhenyingfang / Awesome-Temporal-Action-Detection-Temporal-Action-Proposal-Generation
Temporal Action Detection & Weakly Supervised Temporal Action Detection & Temporal Action Proposal Generation
☆515Updated last month
microsoft / SimMIM
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
☆988Updated 2 years ago
lucidrains / TimeSformer-pytorch
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
☆719Updated 3 years ago
microsoft / GLIP
Grounded Language-Image Pre-training
☆2,472Updated last year
facebookresearch / mae_st
Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"
☆346Updated 8 months ago
KMnP / vpt
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
☆1,142Updated last year
jianzongwu / Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆944Updated 4 months ago
taoyang1122 / adapt-image-models
[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
☆291Updated last year
OpenGVLab / unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
☆336Updated last year
OpenGVLab / video-mamba-suite
The suite of modeling video with Mamba
☆279Updated last year