leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-ModelsLinks

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

☆70

Alternatives and similar repositories for Synthesizer-Rethinking-Self-Attention-Transformer-Models

Users that are interested in Synthesizer-Rethinking-Self-Attention-Transformer-Models are comparing it to the libraries listed below

Sorting:

10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆100Updated 4 years ago
lancopku / Explicit-Sparse-Transformer
code for Explicit Sparse Transformer
☆62Updated 2 years ago
jaketae / g-mlp
PyTorch implementation of Pay Attention to MLPs
☆40Updated 4 years ago
Junya-Chen / FlatCLR
FlatNCE: A Novel Contrastive Representation Learning Objective
☆90Updated 3 years ago
twistedcubic / attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆165Updated 4 years ago
xuanqing94 / FLOATER
Learning to Encode Position for Transformer with Continuous Dynamical Model
☆60Updated 5 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆119Updated 4 years ago
kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆76Updated 5 years ago
lzy1732008 / GaussionTransformer
For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》
☆28Updated 5 years ago
CupidJay / MoCov3-pytorch
custom pytorch implementation of MoCo v3
☆46Updated 4 years ago
cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆33Updated 2 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆152Updated 2 years ago
lancopku / AdaNorm
Code for "Understanding and Improving Layer Normalization"
☆46Updated 5 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
lucidrains / omninet-pytorch
Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch
☆58Updated 4 years ago
fawazsammani / mogrifier-lstm-pytorch
Implementation of Mogrifier LSTM in PyTorch
☆34Updated 5 years ago
intersun / CoDIR
Code for EMNLP 2020 paper CoDIR
☆41Updated 2 years ago
davidsvy / cosformer-pytorch
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".
☆44Updated 3 years ago
lucidrains / distilled-retriever-pytorch
Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"
☆32Updated 4 years ago
CharizardAcademy / convtransformer
Code for the ACL2020 paper Character-Level Translation with Self-Attention
☆31Updated 4 years ago
linzehui / Curriculum-Learning-PaperList-Materials
Curriculum Learning related papers and materials
☆54Updated 4 years ago
lioutasb / TaLKConvolutions
Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)
☆29Updated 4 years ago
jackyyy0228 / Order-free-Learning-Alleviating-Exposure-Bias-in-Multi-label-Classification
☆20Updated 5 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆119Updated 4 years ago
RMichaelSwan / MogrifierLSTM
A quick walk-through of the innards of LSTMs and a naive implementation of the Mogrifier LSTM paper in PyTorch
☆78Updated 4 years ago
TensorUI / relative-position-pytorch
a pytorch implementation of self-attention with relative position representations
☆50Updated 4 years ago
zlinao / Variational-Transformer
Variational Transformers for Diverse Response Generation
☆81Updated last year