tnq177 / transformers_without_tearsLinks

Transformers without Tears: Improving the Normalization of Self-Attention

☆134

Alternatives and similar repositories for transformers_without_tears

Users that are interested in transformers_without_tears are comparing it to the libraries listed below

Sorting:

laiguokun / Funnel-Transformer
☆219Updated 5 years ago
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
bloodwass / mixout
Implementation of Mixout with PyTorch
☆75Updated 2 years ago
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
ofirpress / sandwich_transformer
This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer …
☆55Updated 4 years ago
IBM / PoWER-BERT
Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…
☆62Updated 2 months ago
zomux / lanmt
LaNMT: Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference
☆79Updated 4 years ago
clovaai / length-adaptive-transformer
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)
☆102Updated 5 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆173Updated 5 years ago
namisan / exdeep-nmt
☆32Updated 4 years ago
jungokasai / deep-shallow
☆44Updated 5 years ago
RayeRen / multilingual-kd-pytorch
ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
☆70Updated 5 years ago
UriSha / EmbeddinglessNMT
The implementation of "Neural Machine Translation without Embeddings", NAACL 2021
☆33Updated 4 years ago
facebookresearch / Mask-Predict
A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…
☆245Updated 4 years ago
lancopku / Prime
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
☆86Updated 2 years ago
allenai / sledgehammer
☆48Updated 5 years ago
bzhangGo / zero
Zero -- A neural machine translation system
☆153Updated 2 years ago
zbloss / reformer_lm
a Pytorch implementation of the Reformer Network (https://openreview.net/pdf?id=rkgNKkHtvB)
☆53Updated 3 years ago
microsoft / infinibatch
Efficient, check-pointed data loading for deep learning with massive data sets.
☆210Updated 2 years ago
dreamgonfly / BERT-pytorch
PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
☆110Updated 7 years ago
cindyxinyiwang / SDE
Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"
☆29Updated 4 years ago
nyu-dl / dl4mt-nonauto
☆119Updated 6 years ago
lucidrains / charformer-pytorch
Implementation of the GBST block from the Charformer paper, in Pytorch
☆119Updated 4 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆332Updated 3 years ago
uds-lsv / bert-stable-fine-tuning
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
☆137Updated 2 years ago
ofirpress / shortformer
Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.
☆147Updated 4 years ago
noe / fairseq-tensorboard
Small utility to monitor fairseq training in tensorboard
☆21Updated 6 years ago
allenai / tpu_pretrain
LM Pretraining with PyTorch/TPU
☆136Updated 6 years ago