ictnlp / awesome-transformer
A collection of transformer's guides, implementations and variants.
☆102Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-transformer
- Worth-reading papers and related resources on attention mechanism, Transformer and pretrained language model (PLM) such as BERT. 值得一读的注意力…☆132Updated 3 years ago
- This project attempts to maintain the SOTA performance in machine translation☆108Updated 4 years ago
- DisCo Transformer for Non-autoregressive MT☆78Updated 2 years ago
- Code for the paper "Are Sixteen Heads Really Better than One?"☆168Updated 4 years ago
- Some good(maybe) papers about NMT (Neural Machine Translation).☆84Updated 4 years ago
- Understanding the Difficulty of Training Transformers☆326Updated 2 years ago
- For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).☆184Updated last year
- ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation☆70Updated 4 years ago
- Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>☆41Updated 4 years ago
- ☆94Updated 3 years ago
- Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"☆136Updated last year
- Code for the RecAdam paper: Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting.☆115Updated 4 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 3 years ago
- PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"☆269Updated 2 years ago
- Code for NeurIPS2020 "Incorporating BERT into Parallel Sequence Decoding with Adapters"☆32Updated 2 years ago
- A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.☆87Updated last year
- A PyTorch implementation of Transformer in "Attention is All You Need"☆103Updated 3 years ago
- ☆95Updated 2 years ago
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆92Updated 3 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆111Updated 5 years ago
- ☆120Updated 5 years ago
- The implementation of "Learning Deep Transformer Models for Machine Translation"☆114Updated 3 months ago
- ☆13Updated 5 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆125Updated 3 years ago
- Cascaded Text Generation with Markov Transformers☆128Updated last year
- ☆83Updated 4 years ago
- Transformers without Tears: Improving the Normalization of Self-Attention☆130Updated 5 months ago
- [ACL‘20] Highway Transformer: A Gated Transformer.☆32Updated 2 years ago
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆51Updated 2 years ago
- Tracking the progress in non-autoregressive generation (translation, transcription, etc.)☆305Updated last year