vudaoanhtuan / TransformerLinks
Transformer, Evolved Transformer Model
☆10Updated 6 years ago
Alternatives and similar repositories for Transformer
Users that are interested in Transformer are comparing it to the libraries listed below
Sorting:
- Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch☆70Updated 5 years ago
- Visualization for simple attention and Google's multi-head attention.☆68Updated 7 years ago
- A PyTorch implementation of Transformer in "Attention is All You Need"☆106Updated 4 years ago
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 4 years ago
- UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning☆70Updated 4 years ago
- Source code for "Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation"☆18Updated 6 years ago
- A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"☆73Updated 2 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 5 years ago
- Code for paper "Continual and Multi-Task Architecture Search (ACL 2019)"☆41Updated 6 years ago
- ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation☆70Updated 4 years ago
- Code for "Understanding and Improving Layer Normalization"☆46Updated 5 years ago
- Implementation of RealFormer using pytorch☆101Updated 4 years ago
- Code for "simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions" (EMNLP 2018)☆36Updated 6 years ago
- Implementation of a Quantized Transformer Model☆19Updated 6 years ago
- Code for EMNLP 2020 paper CoDIR☆41Updated 2 years ago
- Zero-Shot Knowledge Distillation in Deep Networks in ICML2019☆49Updated 6 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆113Updated 6 years ago
- tf2.0 implementation of circle loss☆32Updated 5 years ago
- A dual learning toolkit developed by Microsoft Research☆71Updated 2 years ago
- Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"☆47Updated 3 years ago
- Curriculum Learning related papers and materials☆54Updated 4 years ago
- Unicoder model for understanding and generation.☆91Updated last year
- ☆20Updated 5 years ago
- ☆18Updated last year
- A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.☆85Updated 2 years ago
- A PyTorch implementation of : Language Modeling with Gated Convolutional Networks.☆100Updated 3 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆128Updated 4 years ago
- DisCo Transformer for Non-autoregressive MT☆77Updated 3 years ago
- tunz's CUDA pytorch operator (MaskedSoftmax)☆75Updated 6 years ago
- Various implementations and experimentation for deep neural network model compression☆24Updated 6 years ago