jaketae / g-mlpLinks

PyTorch implementation of Pay Attention to MLPs

☆41

Alternatives and similar repositories for g-mlp

Users that are interested in g-mlp are comparing it to the libraries listed below

Sorting:

leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆75Updated 5 years ago
lancopku / Explicit-Sparse-Transformer
code for Explicit Sparse Transformer
☆61Updated 2 years ago
CupidJay / MoCov3-pytorch
custom pytorch implementation of MoCo v3
☆46Updated 4 years ago
cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆33Updated 2 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
lzy1732008 / GaussionTransformer
For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》
☆28Updated 5 years ago
davidsvy / cosformer-pytorch
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".
☆44Updated 3 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆101Updated 4 years ago
haofanwang / awesome-mlp-papers
Recent Advances in MLP-based Models (MLP is all you need!)
☆117Updated 2 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
lancopku / AdaNorm
Code for "Understanding and Improving Layer Normalization"
☆46Updated 5 years ago
lucidrains / hamburger-pytorch
Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"
☆99Updated 4 years ago
AvivNavon / AuxiLearn
Official implementation of Auxiliary Learning by Implicit Differentiation [ICLR 2021]
☆86Updated last year
lucidrains / cross-transformers-pytorch
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch
☆54Updated 4 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago
rickgroen / cov-weighting
Implementation for our WACV 2021 paper "Multi-Loss Weighting with Coefficient of Variations"
☆51Updated 4 years ago
hunto / ReLoss
Official implementation for paper "Relational Surrogate Loss Learning", ICLR 2022
☆36Updated 2 years ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
yaohungt / TransformerDissection
[EMNLP'19] Summary for Transformer Understanding
☆53Updated 5 years ago
pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆63Updated 3 years ago
jackyyy0228 / Order-free-Learning-Alleviating-Exposure-Bias-in-Multi-label-Classification
☆20Updated 5 years ago
jaketae / fnet
PyTorch implementation of FNet: Mixing Tokens with Fourier transforms
☆28Updated 4 years ago
huangleiBuaa / NormalizationSurvey
This repo is for our paper: Normalization Techniques in Training DNNs: Methodology, Analysis and Application
☆85Updated 4 years ago
mlpc-ucsd / BERT_Convolutions
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.
☆21Updated 3 years ago
lucidrains / omninet-pytorch
Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch
☆59Updated 4 years ago
VITA-Group / layerGraftedPretraining_ICLR23
[ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…
☆24Updated 2 years ago
rishikksh20 / rectified-linear-attention
Sparse Attention with Linear Units
☆19Updated 4 years ago
thegregyang / LossUpAccUp
Loss and accuracy go opposite ways...right?
☆95Updated 5 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago