cghezhang / AdamWLinks
https://arxiv.org/abs/1711.05101
☆17Updated 7 years ago
Alternatives and similar repositories for AdamW
Users that are interested in AdamW are comparing it to the libraries listed below
Sorting:
- Reversible Recurrent Neural Network Pytorch Implementation☆21Updated 7 years ago
- A Toolkit for Training, Tracking, Saving Models and Syncing Results☆62Updated 5 years ago
- tunz's CUDA pytorch operator (MaskedSoftmax)☆75Updated 6 years ago
- Highway networks implemented in PyTorch.☆55Updated 8 years ago
- ☆26Updated 5 years ago
- ☆74Updated 8 years ago
- Mxnet implementation of an ICLR 2018 paper: A new method of region embedding for text classification.☆10Updated 6 years ago
- Label Embedding Network☆92Updated 7 years ago
- Code for paper "Continual and Multi-Task Architecture Search (ACL 2019)"☆41Updated 6 years ago
- A Tensorflow implementation of Yin Wenpeng's recent paper on TACL "Attentive Convolution"☆33Updated 7 years ago
- Implementation in pytorch of SR-NMT https://arxiv.org/abs/1805.04185v1☆26Updated 7 years ago
- ☆38Updated 8 years ago
- Training RNNs as Fast as CNNs (Simple Recurrent Unit)☆31Updated 7 years ago
- Implementation of Neural Arithmetic Logic Units (https://arxiv.org/pdf/1808.00508.pdf)☆31Updated 6 years ago
- Partially Adaptive Momentum Estimation method in the paper "Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep …☆39Updated 2 years ago
- Efficient Neural Interaction Functions Search for Collaborative Filtering☆18Updated 5 years ago
- Attention is All You Need in Sonnet☆38Updated 8 years ago
- A PyTorch Implementation of "Quasi-Recurrent Neural Networks"☆46Updated 7 years ago
- souce code for "Accelerating Neural Transformer via an Average Attention Network"☆78Updated 6 years ago
- Bi-Directional Block Self-Attention☆122Updated 7 years ago
- Sequential Matching Network implemented by MXNET☆18Updated 6 years ago
- Adaptive embedding and softmax☆17Updated 3 years ago
- Code for EACL '17 paper "Identifying beneficial task relations for multi-task learning in deep neural networks"☆45Updated 8 years ago
- An unofficial PyTorch implementation of the HAN and AdaHAN models presented in the "Learning Visual Question Answering by Bootstrapping H…☆55Updated 7 years ago
- This released code corresponds to TACL paper "attentive convolution". Attentive Convolution aims to generate a vector for two sentences.☆105Updated 7 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆113Updated 6 years ago
- Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation pr…☆44Updated 6 years ago
- Source code for "Training Generative Adversarial Networks Via Turing Test".☆13Updated 5 years ago
- Official code of our work, Robust, Transferable Sentence Representations for Text Classification [Arxiv 2018].☆22Updated 6 years ago
- RAdam optimizer for keras☆71Updated 5 years ago