xiaobai1217 / Awesome-Video-Datasets
Video datasets
☆1,216Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Awesome-Video-Datasets
- ☆778Updated 6 months ago
- [ECCV2024] Video Foundation Models & Data for Multimodal Understanding☆1,421Updated this week
- Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and T…☆536Updated 3 weeks ago
- The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"☆1,560Updated 7 months ago
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆901Updated 8 months ago
- [NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training☆1,380Updated 11 months ago
- A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''☆1,203Updated 8 months ago
- Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.☆451Updated last year
- [CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking☆520Updated last month
- Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).☆1,137Updated 4 months ago
- Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset☆360Updated this week
- Grounded Language-Image Pre-training☆2,231Updated 9 months ago
- Recent Transformer-based CV and related works.☆1,324Updated last year
- Scenic: A Jax Library for Computer Vision Research and Beyond☆3,334Updated last month
- This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"☆518Updated 11 months ago
- Temporal Action Detection & Weakly Supervised Temporal Action Detection & Temporal Action Proposal Generation☆443Updated last week
- Omnivore: A Single Model for Many Visual Modalities☆559Updated 2 years ago
- A curated list of prompt-based paper in computer vision and vision-language learning.☆897Updated 11 months ago
- Code release for "Learning Video Representations from Large Language Models"☆492Updated last year
- A deep learning library for video understanding research.☆3,334Updated 3 months ago
- ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in co…☆935Updated 2 months ago
- PyTorch implementation of a collections of scalable Video Transformer Benchmarks.☆283Updated 2 years ago
- The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-languag…☆219Updated 2 years ago
- A curated list of deep learning resources for video-text retrieval.☆594Updated last year
- A method to increase the speed and lower the memory footprint of existing vision transformers.☆970Updated 5 months ago
- A curated list of different papers and datasets in various areas of audio-visual processing☆670Updated 9 months ago
- This is an official implementation for "Video Swin Transformers".☆1,451Updated last year
- ☆463Updated 2 weeks ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]☆663Updated 4 months ago
- Code release for ActionFormer (ECCV 2022)☆444Updated 7 months ago