lucidrains / linear-attention-transformerLinks

Transformer based on a variant of attention that is linear complexity in respect to sequence length

☆811

Alternatives and similar repositories for linear-attention-transformer

Users that are interested in linear-attention-transformer are comparing it to the libraries listed below

Sorting:

idiap / fast-transformers
Pytorch library for fast transformer implementations
☆1,749Updated 2 years ago
lucidrains / performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
☆1,156Updated 3 years ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆301Updated last year
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆484Updated 4 months ago
google-research / long-range-arena
Long Range Arena for Benchmarking Efficient Transformers
☆767Updated last year
lucidrains / mlp-mixer-pytorch
An All-MLP solution for Vision, from Google AI
☆1,050Updated 4 months ago
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆779Updated 3 months ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆421Updated 3 years ago
lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆825Updated 2 years ago
lucidrains / reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
☆2,184Updated 2 years ago
lucidrains / perceiver-pytorch
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
☆1,176Updated 2 years ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆383Updated 2 years ago
lucidrains / FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
☆369Updated 2 years ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆296Updated 4 years ago
sail-sg / Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
☆802Updated 5 months ago
lucidrains / g-mlp-pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
☆430Updated 4 years ago
locuslab / convmixer
Implementation of ConvMixer for "Patches Are All You Need? 🤷"
☆1,077Updated 3 years ago
fadel / pytorch_ema
Tiny PyTorch library for maintaining a moving average of a collection of parameters.
☆439Updated last year
davidmrau / mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,203Updated last year
lucidrains / sinkhorn-transformer
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆268Updated 4 years ago
lucidrains / ema-pytorch
A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model
☆621Updated 11 months ago
mlpen / Nystromformer
☆385Updated 2 years ago
tatp22 / multidim-positional-encoding
An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow
☆609Updated last year
Tony-Y / pytorch_warmup
Learning Rate Warmup in PyTorch
☆413Updated 4 months ago
teddykoker / torchsort
Fast, differentiable sorting and ranking in PyTorch
☆844Updated 5 months ago
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆543Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
krasserm / perceiver-io
A PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR with PyTorch Lightning scripts for distributed training
☆496Updated last year
karpathy / deep-vector-quantization
VQVAEs, GumbelSoftmaxes and friends
☆595Updated 3 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago