kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆73Updated 4 years ago
Alternatives and similar repositories for Linear-Multihead-Attention:
Users that are interested in Linear-Multihead-Attention are comparing it to the libraries listed below
- PyTorch implementation of Pay Attention to MLPs☆40Updated 3 years ago
- Warmup learning rate wrapper for Pytorch Scheduler☆40Updated 4 years ago
- code for Explicit Sparse Transformer☆60Updated last year
- Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms☆20Updated 3 years ago
- This repo is for our paper: Normalization Techniques in Training DNNs: Methodology, Analysis and Application☆84Updated 3 years ago
- Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch☆70Updated 4 years ago
- A guide that integrates Pytorch DistributedDataParallel, Apex, warmup, learning rate scheduler, also mentions the set-up of early-stoppin…☆63Updated 2 years ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆185Updated 2 years ago
- Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch☆51Updated 3 years ago
- ☆57Updated 3 years ago
- Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".☆44Updated 3 years ago
- Implementations of Recent Papers in Computer Vision☆39Updated 2 years ago
- WeightNet: Revisiting the Design Space of Weight Networks☆19Updated 4 years ago
- Recent Advances in MLP-based Models (MLP is all you need!)☆113Updated 2 years ago
- [ICLR'22 Oral] Implementation of "CycleMLP: A MLP-like Architecture for Dense Prediction"☆283Updated 2 years ago
- Implementation of Online Label Smoothing in PyTorch☆94Updated 2 years ago
- official implemntation for "Contrastive Learning with Stronger Augmentations"☆57Updated 3 years ago
- [CVPR 2021] Code release for "Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination."☆100Updated 2 years ago
- [ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…☆23Updated 2 years ago
- Official Pytorch implementation of MixMo framework☆84Updated 3 years ago
- ☆44Updated 3 years ago
- Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"☆98Updated 4 years ago
- PyTorch implementation of the paper "SuperLoss: A Generic Loss for Robust Curriculum Learning" in NIPS 2020.☆29Updated 4 years ago
- MSc group project: Reproduction of 'Multi-Task Learning using Uncertainty to Weigh Losses for Scene Geometry and Semantics'; A. Kendall, …☆87Updated 5 years ago
- MLP-Like Vision Permutator for Visual Recognition (PyTorch)☆191Updated 2 years ago
- Transformers w/o Attention, based fully on MLPs☆93Updated 10 months ago
- Pytorch implementation of CVPR2021 paper: SuperMix: Supervising the Mixing Data Augmentation☆92Updated 3 years ago
- Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch☆57Updated 3 years ago
- open source the research work for published on arxiv. https://arxiv.org/abs/2106.02689☆51Updated 3 years ago
- Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…☆98Updated 2 years ago