10-zin / SynthesizerLinks

A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"

☆73

Alternatives and similar repositories for Synthesizer

Users that are interested in Synthesizer are comparing it to the libraries listed below

Sorting:

lucidrains / coco-lm-pytorch
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch
☆46Updated 4 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆151Updated 2 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆101Updated 4 years ago
lucidrains / distilled-retriever-pytorch
Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"
☆32Updated 4 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
intersun / CoDIR
Code for EMNLP 2020 paper CoDIR
☆41Updated 3 years ago
Rick-McCoy / Reformer-pytorch
Implements Reformer: The Efficient Transformer in pytorch.
☆86Updated 5 years ago
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
lioutasb / TaLKConvolutions
Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)
☆29Updated 4 years ago
nng555 / ssmba
☆62Updated 3 years ago
lancopku / AdaNorm
Code for "Understanding and Improving Layer Normalization"
☆46Updated 5 years ago
prajjwal1 / adaptive_transformer
Code for the paper "Adaptive Transformers for Learning Multimodal Representations" (ACL SRW 2020)
☆43Updated 3 years ago
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
CAMTL / CA-MTL
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
☆57Updated 4 years ago
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
wilile26811249 / Fastformer-PyTorch
Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."
☆133Updated 4 years ago
lancopku / Prime
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
☆86Updated 2 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
linzehui / Curriculum-Learning-PaperList-Materials
Curriculum Learning related papers and materials
☆53Updated 4 years ago
namisan / exdeep-nmt
☆32Updated 4 years ago
ofirpress / sandwich_transformer
This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer …
☆55Updated 4 years ago
lucidrains / multistream-transformers
Implementation of Multistream Transformers in Pytorch
☆54Updated 4 years ago
FreedomIntelligence / complex-order
☆84Updated 5 years ago
yaohungt / TransformerDissection
[EMNLP'19] Summary for Transformer Understanding
☆53Updated 5 years ago
jackyyy0228 / Order-free-Learning-Alleviating-Exposure-Bias-in-Multi-label-Classification
☆20Updated 5 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆123Updated 4 years ago
zlinao / Variational-Transformer
Variational Transformers for Diverse Response Generation
☆81Updated last year
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
RayeRen / multilingual-kd-pytorch
ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
☆70Updated 5 years ago