lucidrains / sinkhorn-transformerLinks

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

☆267

Alternatives and similar repositories for sinkhorn-transformer

Users that are interested in sinkhorn-transformer are comparing it to the libraries listed below

Sorting:

lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆297Updated 3 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆225Updated 3 years ago
mlpen / Nystromformer
☆378Updated last year
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆417Updated 3 years ago
KrisKorrel / sparsemax-pytorch
Implementation of Sparsemax activation in Pytorch
☆161Updated 5 years ago
DeMoriarty / TorchPQ
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
☆226Updated last year
cybertronai / pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
☆376Updated 4 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
deep-spin / entmax
The entmax mapping and its loss, a family of sparse softmax alternatives.
☆443Updated last year
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆152Updated 2 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆329Updated 3 years ago
sacmehta / delight
DeLighT: Very Deep and Light-Weight Transformers
☆469Updated 4 years ago
twistedcubic / attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆165Updated 4 years ago
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆105Updated 4 years ago
laiguokun / Funnel-Transformer
☆218Updated 5 years ago
majumderb / rezero
Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"
☆410Updated last year
lucidrains / compressive-transformer-pytorch
Pytorch implementation of Compressive Transformers, from Deepmind
☆163Updated 3 years ago
lukemelas / do-you-even-need-attention
Is the attention layer even necessary? (https://arxiv.org/abs/2105.02723)
☆486Updated 4 years ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆294Updated last year
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆119Updated 4 years ago
guolinke / TUPE
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…
☆251Updated 3 years ago
lucidrains / g-mlp-pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
☆428Updated 3 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
LeviViana / torch_sampling
Efficient reservoir sampling implementation for PyTorch
☆106Updated 3 years ago
rishikksh20 / FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
☆259Updated 4 years ago
lucidrains / h-transformer-1d
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
☆163Updated last year
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆107Updated 4 years ago
cpcp1998 / PermuteFormer
Code for the paper PermuteFormer
☆42Updated 3 years ago
takashiishida / flooding
[ICML 2020] code for the flooding regularizer proposed in "Do We Need Zero Training Loss After Achieving Zero Training Error?"
☆92Updated 2 years ago