NVIDIA / transformer-lsLinks

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

☆228

Alternatives and similar repositories for transformer-ls

Users that are interested in transformer-ls are comparing it to the libraries listed below

Sorting:

lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆296Updated 3 years ago
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆299Updated last year
lucidrains / sinkhorn-transformer
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆268Updated 4 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
rishikksh20 / FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
☆259Updated 4 years ago
pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆63Updated 3 years ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆217Updated 2 years ago
lukemelas / do-you-even-need-attention
Is the attention layer even necessary? (https://arxiv.org/abs/2105.02723)
☆485Updated 4 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆151Updated 2 years ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆420Updated 3 years ago
twistedcubic / attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆166Updated 4 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆123Updated 4 years ago
rish-16 / aft-pytorch
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.
☆243Updated 3 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆300Updated 2 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆206Updated 2 years ago
lucidrains / compressive-transformer-pytorch
Pytorch implementation of Compressive Transformers, from Deepmind
☆162Updated 4 years ago
google-research / diffstride
TF/Keras code for DiffStride, a pooling layer with learnable strides.
☆124Updated 3 years ago
facebookresearch / data2vec_vision
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆80Updated 3 years ago
lucidrains / g-mlp-pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
☆430Updated 4 years ago
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆105Updated 4 years ago
lucidrains / FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
☆369Updated 2 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆330Updated 3 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆101Updated 4 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆108Updated 4 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
ankandrew / online-label-smoothing-pt
Implementation of Online Label Smoothing in PyTorch
☆95Updated 3 years ago
ctlllll / SGConv
☆164Updated 2 years ago