soskek / attention_is_all_you_needLinks
Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.
☆321Updated 7 years ago
Alternatives and similar repositories for attention_is_all_you_need
Users that are interested in attention_is_all_you_need are comparing it to the libraries listed below
Sorting:
- Recurrent Highway Networks - Implementations for Tensorflow, Torch7, Theano and Brainstorm☆402Updated 5 years ago
- ByteNet for character-level language modelling☆319Updated 7 years ago
- Language Modeling☆156Updated 5 years ago
- Tensorflow implementation of "Language Modeling with Gated Convolutional Networks"☆271Updated 8 years ago
- A tensorflow implementation of Fairseq Convolutional Sequence to Sequence Learning(Gehring et al. 2017)☆305Updated 8 years ago
- QRNN implementation for TensorFlow☆236Updated 2 years ago
- Batch normalized LSTM for tensorflow☆179Updated 8 years ago
- Mixed Incremental Cross-Entropy REINFORCE ICLR 2016☆331Updated 8 years ago
- Adaptive Computation Time algorithm in Tensorflow☆256Updated 8 years ago
- TensorFlow implementation of "Tracking the World State with Recurrent Entity Networks".☆273Updated 7 years ago
- attention model for entailment on SNLI corpus implemented in Tensorflow and Keras☆177Updated 8 years ago
- Code for Stanford CS224D: deep learning for natural language understanding☆223Updated 5 years ago
- Sequence-to-Sequence learning using PyTorch☆521Updated 5 years ago
- ☆395Updated 6 years ago
- ☆165Updated 8 years ago
- Generative adversarial networks (GAN) applied to sequential data via recurrent neural networks (RNN).☆395Updated 8 years ago
- End-To-End Memory Network using Tensorflow☆343Updated 8 years ago
- Tensorflow implementation for DilatedRNN☆349Updated 7 years ago
- ☆617Updated 8 years ago
- Hierarchical Encoder Decoder RNN (HRED) with Truncated Backpropagation Through Time (Truncated BPTT)☆307Updated 5 years ago
- Code for Structured Attention Networks https://arxiv.org/abs/1702.00887☆238Updated 8 years ago
- Implements an efficient softmax approximation as described in the paper "Efficient softmax approximation for GPUs" (http://arxiv.org/abs/…☆395Updated 6 years ago
- Nested LSTM Cell☆251Updated 7 years ago
- ☆218Updated 9 years ago
- A tutorial about neural machine translation including tips on building practical systems☆368Updated 8 years ago
- [unmaintained] Make seq2seq for keras work☆232Updated 8 years ago
- ☆143Updated 7 years ago
- ☆167Updated 8 years ago
- MXNet based Neural Machine Translation☆118Updated 6 years ago
- TensorFlow implementation of normalizations such as Layer Normalization, HyperNetworks.☆111Updated 8 years ago