JuanFMontesinos / PyNVIdeoReaderLinks
GPU-accelerated video decoder
☆20Updated 4 years ago
Alternatives and similar repositories for PyNVIdeoReader
Users that are interested in PyNVIdeoReader are comparing it to the libraries listed below
Sorting:
- ☆72Updated last year
- ☆68Updated 2 years ago
- Code Release for MeMViT Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition, CVPR 2022☆149Updated 2 years ago
- [CVPR2023] Code for "Streaming Video Model"☆79Updated 2 years ago
- Official source code for "Continual 3D Convolutional Neural Networks for Real-time Processing of Videos" [ECCV2022]☆45Updated 2 years ago
- ☆177Updated 3 years ago
- Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496☆92Updated 5 months ago
- [WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https…☆36Updated 3 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆103Updated last year
- ☆47Updated 3 years ago
- ☆56Updated 3 years ago
- ViT trained on COYO-Labeled-300M dataset☆32Updated 2 years ago
- An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"☆92Updated last year
- Code repository for "It's About Time: Analog clock Reading in the Wild"☆78Updated last year
- Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification☆134Updated 4 years ago
- "Object-Region Video Transformers”, Herzig et al., CVPR 2022☆48Updated 3 years ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆78Updated 2 years ago
- Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…☆102Updated 3 years ago
- A library of transformer models for computer vision and multi-modality research☆49Updated 4 years ago
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Updated 4 years ago
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆40Updated 2 years ago
- Video Contrastive Learning with Global Context, ICCVW 2021☆159Updated 3 years ago
- Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers☆231Updated 3 years ago
- Code for the Video Similarity Challenge.☆80Updated last year
- ☆109Updated 2 years ago
- Code for Temporal Data Augmentations (ECCVW 2020)☆37Updated 5 years ago
- Datasets, transforms and samplers for video in PyTorch☆88Updated last year
- [NeurIPS'22] ReCo: Retrieve and Co-segment for Zero-shot Transfer☆62Updated 2 years ago
- ☆19Updated 4 months ago
- ECCV2022,Bootstrapped Masked Autoencoders for Vision BERT Pretraining☆97Updated 2 years ago