google-deepmind / dmvr
☆65Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for dmvr
- Download scripts for EPIC-KITCHENS☆124Updated 3 months ago
- HVU Downloader tool☆17Updated 4 years ago
- CLIP-It! Language-Guided Video Summarization☆73Updated 3 years ago
- Code Release for MeMViT Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition, CVPR 2022☆145Updated last year
- PyTorch GPU distributed training code for MIL-NCE HowTo100M☆214Updated 2 years ago
- Datasets, transforms and samplers for video in PyTorch☆86Updated last year
- Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)☆90Updated 2 years ago
- Implementations of Transformers for Video☆24Updated 3 years ago
- Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification☆130Updated 3 years ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆120Updated last year
- S3D Text-Video model trained on HowTo100M using MIL-NCE☆191Updated 4 years ago
- ☆69Updated last year
- Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"☆137Updated 2 years ago
- Code for the HowTo100M paper☆252Updated 4 years ago
- The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding☆62Updated 2 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆49Updated 2 years ago
- Video Contrastive Learning with Global Context, ICCVW 2021☆158Updated 2 years ago
- [NeurIPS 2021 Spotlight] Official implementation of Long Short-Term Transformer for Online Action Detection☆129Updated 3 months ago
- The Holistic Video Understanding Dataset (ECCV 2020 Spotlight presentation)☆70Updated 3 years ago
- ☆74Updated 2 years ago
- ☆21Updated 11 months ago
- Audio Visual Instance Discrimination with Cross-Modal Agreement☆127Updated 3 years ago
- Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)☆127Updated 3 years ago
- Use CLIP to represent video for Retrieval Task☆69Updated 3 years ago
- Feature Extractor module for videos using the PySlowFast framework☆77Updated 3 years ago
- A PyTorch implementation of VIOLET☆137Updated 11 months ago
- ☆102Updated last year
- [TIP 2022] End-to-end Temporal Action Detection with Transformer☆144Updated last year
- This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection☆71Updated last year
- [ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "☆98Updated last year