kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆75Updated 4 years ago
Alternatives and similar repositories for Linear-Multihead-Attention:
Users that are interested in Linear-Multihead-Attention are comparing it to the libraries listed below
- PyTorch implementation of Pay Attention to MLPs☆40Updated 3 years ago
- Warmup learning rate wrapper for Pytorch Scheduler☆41Updated 4 years ago
- Recent Advances in MLP-based Models (MLP is all you need!)☆114Updated 2 years ago
- A Pytorch implementation of Global Self-Attention Network, a fully-attention backbone for vision tasks☆94Updated 4 years ago
- code for Explicit Sparse Transformer☆60Updated last year
- Attention mechanism☆54Updated 3 years ago
- Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"☆98Updated 4 years ago
- Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch☆52Updated 4 years ago
- ☆191Updated 2 years ago
- Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…☆99Updated 2 years ago
- CrossNorm and SelfNorm for Generalization under Distribution Shifts, ICCV 2021☆130Updated 3 years ago
- Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch☆70Updated 4 years ago
- Implementation of Mogrifier LSTM in PyTorch☆35Updated 4 years ago
- Implementation of Online Label Smoothing in PyTorch☆94Updated 2 years ago
- PyTorch code for the paper "CrossTransformers: spatially-aware few-shot transfer"☆23Updated 4 years ago
- ☆48Updated 2 years ago
- This repo is for our paper: Normalization Techniques in Training DNNs: Methodology, Analysis and Application☆84Updated 3 years ago
- [NeurIPS 2021] Official codes for "Efficient Training of Visual Transformers with Small Datasets".☆141Updated 2 months ago
- Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning.☆151Updated 2 years ago
- [ICLR'22 Oral] Implementation of "CycleMLP: A MLP-like Architecture for Dense Prediction"☆286Updated 2 years ago
- MLP-Like Vision Permutator for Visual Recognition (PyTorch)☆191Updated 3 years ago
- SKD : Self-supervised Knowledge Distillation for Few-shot Learning☆96Updated last year
- Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch☆57Updated 4 years ago
- Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification☆81Updated 3 years ago
- Pytorch implementation of CVPR2021 paper: SuperMix: Supervising the Mixing Data Augmentation☆92Updated 3 years ago
- Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones☆198Updated 4 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆118Updated 3 years ago
- Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral☆90Updated 3 years ago
- ☆27Updated 2 years ago
- AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning☆112Updated 4 years ago