kuixu / Linear-Multihead-AttentionLinks

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

☆76

Alternatives and similar repositories for Linear-Multihead-Attention

Users that are interested in Linear-Multihead-Attention are comparing it to the libraries listed below

Sorting:

jaketae / g-mlp
PyTorch implementation of Pay Attention to MLPs
☆40Updated 4 years ago
huangleiBuaa / NormalizationSurvey
This repo is for our paper: Normalization Techniques in Training DNNs: Methodology, Analysis and Application
☆85Updated 4 years ago
ankandrew / online-label-smoothing-pt
Implementation of Online Label Smoothing in PyTorch
☆94Updated 2 years ago
haofanwang / awesome-mlp-papers
Recent Advances in MLP-based Models (MLP is all you need!)
☆116Updated 2 years ago
alexrame / mixmo-pytorch
Official Pytorch implementation of MixMo framework
☆84Updated 3 years ago
lucidrains / global-self-attention-network
A Pytorch implementation of Global Self-Attention Network, a fully-attention backbone for vision tasks
☆95Updated 4 years ago
lehduong / torch-warmup-lr
Warmup learning rate wrapper for Pytorch Scheduler
☆42Updated 5 years ago
Chenglin-Yang / LESA
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms
☆20Updated 3 years ago
houqb / VisionPermutator
MLP-Like Vision Permutator for Visual Recognition (PyTorch)
☆191Updated 3 years ago
CupidJay / MoCov3-pytorch
custom pytorch implementation of MoCo v3
☆46Updated 4 years ago
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
sunxm2357 / AdaShare
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
☆112Updated 4 years ago
snu-mllab / Co-Mixup
Official PyTorch implementation of "Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity" (ICLR'21 Oral)
☆104Updated 3 years ago
AvivNavon / AuxiLearn
Official implementation of Auxiliary Learning by Implicit Differentiation [ICLR 2021]
☆84Updated last year
lucidrains / cross-transformers-pytorch
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch
☆53Updated 4 years ago
lnsmith54 / CFL
☆95Updated 2 years ago
lironui / Linear-Attention-Mechanism
Attention mechanism
☆53Updated 3 years ago
alldbi / SuperMix
Pytorch implementation of CVPR2021 paper: SuperMix: Supervising the Mixing Data Augmentation
☆92Updated 3 years ago
lucidrains / omninet-pytorch
Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch
☆58Updated 4 years ago
lucidrains / hamburger-pytorch
Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"
☆99Updated 4 years ago
UMBCvision / MSF
Official code for "Mean Shift for Self-Supervised Learning"
☆57Updated 3 years ago
rishikksh20 / MLP-Mixer-pytorch
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
☆218Updated 4 years ago
szq0214 / Un-Mix
Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning.
☆151Updated 2 years ago
wvangansbeke / Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
☆89Updated 3 years ago
bellymonster / Weighted-Soft-Label-Distillation
☆57Updated 4 years ago
yhlleo / VTs-Drloc
[NeurIPS 2021] Official codes for "Efficient Training of Visual Transformers with Small Datasets".
☆144Updated 7 months ago
lancopku / Explicit-Sparse-Transformer
code for Explicit Sparse Transformer
☆62Updated 2 years ago
yukimasano / linear-probes
Evaluating AlexNet features at various depths
☆40Updated 4 years ago
DTennant / CL-Visualizing-Feature-Transformation
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral
☆90Updated 3 years ago