lucidrains / long-short-transformerLinks

Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch

☆120

Alternatives and similar repositories for long-short-transformer

Users that are interested in long-short-transformer are comparing it to the libraries listed below

Sorting:

lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
lucidrains / multistream-transformers
Implementation of Multistream Transformers in Pytorch
☆54Updated 4 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆84Updated 9 months ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆125Updated 5 years ago
lucidrains / n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
☆76Updated 3 years ago
lucidrains / panoptic-transformer
Another attempt at a long-context / efficient transformer by me
☆38Updated 3 years ago
htoyryla / DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
☆59Updated 4 years ago
lucidrains / g-mlp-gpt
GPT, but made only out of MLPs
☆89Updated 4 years ago
cpcp1998 / PermuteFormer
Code for the paper PermuteFormer
☆42Updated 4 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
lucidrains / omninet-pytorch
Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch
☆59Updated 4 years ago
lucidrains / remixer-pytorch
Implementation of the Remixer Block from the Remixer paper, in Pytorch
☆36Updated 4 years ago
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆108Updated 4 years ago
lucidrains / compositional-attention-pytorch
Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…
☆51Updated 3 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Updated 2 years ago
lucidrains / rela-transformer
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Updated 3 years ago
wilile26811249 / Fastformer-PyTorch
Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."
☆133Updated 4 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆152Updated 2 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆101Updated 4 years ago
giannisdaras / smyrf
[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".
☆50Updated 2 years ago
lucidrains / coco-lm-pytorch
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch
☆46Updated 4 years ago
lucidrains / AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
☆43Updated 5 years ago
lucidrains / hamburger-pytorch
Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"
☆99Updated 4 years ago
lucidrains / uniformer-pytorch
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…
☆102Updated 3 years ago
lucidrains / learning-to-expire-pytorch
An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain
☆34Updated 5 years ago
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆110Updated 4 years ago
lucidrains / h-transformer-1d
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
☆166Updated last year