yuexy / PS-ViTLinks
Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.
☆153Updated 3 years ago
Alternatives and similar repositories for PS-ViT
Users that are interested in PS-ViT are comparing it to the libraries listed below
Sorting:
- DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)☆156Updated 3 years ago
- MLP-Like Vision Permutator for Visual Recognition (PyTorch)☆191Updated 3 years ago
- ☆190Updated 2 years ago
- Official Codes for "Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality"☆242Updated 2 years ago
- [CVPR2022 - Oral] Official Jax Implementation of Learned Queries for Efficient Local Attention☆118Updated 3 years ago
- Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)☆98Updated 3 years ago
- [CVPR 2021] Instance Localization for Self-supervised Detection Pretraining☆145Updated 4 years ago
- LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)☆64Updated 2 months ago
- Accelerating T2t-ViT by 1.6-3.6x.☆252Updated 3 years ago
- [CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning☆287Updated 2 years ago
- ☆138Updated 3 years ago
- Official code for paper "On the Connection between Local Attention and Dynamic Depth-wise Convolution" ICLR 2022 Spotlight☆184Updated 2 years ago
- The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration☆85Updated 2 years ago
- ☆109Updated 3 years ago
- [CVPR 2022] This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.☆155Updated 2 years ago
- Reducing spatial redundancy in video recognition. SOTA computational efficiency.☆124Updated 6 months ago
- This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"☆197Updated 2 years ago
- Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones☆199Updated 4 years ago
- ☆216Updated 3 years ago
- The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer☆116Updated last year
- MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning☆143Updated last year
- ☆119Updated 3 years ago
- ☆57Updated 3 years ago
- ECCV2022,Bootstrapped Masked Autoencoders for Vision BERT Pretraining☆97Updated 2 years ago
- [ICLR'22] This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".☆126Updated 2 years ago
- ☆258Updated 2 years ago
- A Close Look at Spatial Modeling: From Attention to Convolution☆91Updated 2 years ago
- Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…☆101Updated 3 years ago
- Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers☆121Updated 3 years ago
- ☆98Updated 3 years ago