dreamgonfly / transformer-pytorchLinks

A PyTorch implementation of Transformer in "Attention is All You Need"

☆106

Alternatives and similar repositories for transformer-pytorch

Users that are interested in transformer-pytorch are comparing it to the libraries listed below

Sorting:

dreamgonfly / BERT-pytorch
PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
☆107Updated 6 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆100Updated 4 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆172Updated 5 years ago
312shan / Pytorch-seq2seq-Beam-Search
PyTorch implementation for Seq2Seq model with attention and Greedy Search / Beam Search for neural machine translation
☆58Updated 4 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
maknotavailable / pytorch-pretrained-BERT
A PyTorch implementation of Google AI's BERT model provided with Google's pre-trained models, examples and utilities.
☆71Updated 3 years ago
shehzaadzd / pytorch-pretrained-BERT
A PyTorch implementation of Google AI's BERT model provided with Google's pre-trained models, examples and utilities.
☆35Updated 6 years ago
Rick-McCoy / Reformer-pytorch
Implements Reformer: The Efficient Transformer in pytorch.
☆86Updated 5 years ago
tnq177 / transformers_without_tears
Transformers without Tears: Improving the Normalization of Self-Attention
☆132Updated last year
lonePatient / electra_pytorch
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
☆91Updated 3 years ago
budzianowski / PyTorch-Beam-Search-Decoding
PyTorch implementation of beam search decoding for seq2seq models
☆337Updated 2 years ago
TensorUI / relative-position-pytorch
a pytorch implementation of self-attention with relative position representations
☆50Updated 4 years ago
microsoft / Unicoder
Unicoder model for understanding and generation.
☆91Updated last year
joongbo / tta
Repository for the paper "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning"
☆109Updated 4 years ago
bloodwass / mixout
Implementation of Mixout with PyTorch
☆75Updated 2 years ago
RayeRen / multilingual-kd-pytorch
ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
☆70Updated 4 years ago
ictnlp / awesome-transformer
A collection of transformer's guides, implementations and variants.
☆105Updated 5 years ago
kaushalshetty / Positional-Encoding
Encoding position with the word embeddings.
☆83Updated 7 years ago
asappresearch / revisit-bert-finetuning
For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).
☆184Updated 2 years ago
graykode / ALBERT-Pytorch
Pytorch Implementation of ALBERT(A Lite BERT for Self-supervised Learning of Language Representations)
☆226Updated 4 years ago
IBM / PoWER-BERT
Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…
☆61Updated 2 months ago
b-etienne / Seq2seq-PyTorch
☆76Updated 5 years ago
HongyuGong / TextStyleTransfer
Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
☆27Updated 6 years ago
varunkumar-dev / TransformersDataAugmentation
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper
☆133Updated 2 years ago
JetRunner / PABEE
Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".
☆65Updated 4 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
andrewpeng02 / transformer-translation
Using Pytorch's nn.Transformer module to create an english to french neural machine translation model.
☆78Updated 4 years ago
FreedomIntelligence / complex-order
☆83Updated 5 years ago
castorini / DeeBERT
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
☆157Updated 3 years ago
takase / share_layer_params
☆28Updated 3 years ago