fawazsammani / mogrifier-lstm-pytorch
Implementation of Mogrifier LSTM in PyTorch
☆35Updated 4 years ago
Alternatives and similar repositories for mogrifier-lstm-pytorch:
Users that are interested in mogrifier-lstm-pytorch are comparing it to the libraries listed below
- A quick walk-through of the innards of LSTMs and a naive implementation of the Mogrifier LSTM paper in PyTorch☆74Updated 4 years ago
- Learning to Encode Position for Transformer with Continuous Dynamical Model☆59Updated 4 years ago
- Pytorch implementation of Performer from the paper "Rethinking Attention with Performers".☆24Updated 4 years ago
- code for Explicit Sparse Transformer☆60Updated last year
- How Does Selective Mechanism Improve Self-attention Networks?☆27Updated 3 years ago
- Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".☆44Updated 3 years ago
- Code for "Understanding and Improving Layer Normalization"☆46Updated 5 years ago
- ☆20Updated 5 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆73Updated 4 years ago
- categorical variational autoencoder using the Gumbel-Softmax estimator☆25Updated 5 years ago
- Code for paper "Continual and Multi-Task Architecture Search (ACL 2019)"☆41Updated 5 years ago
- Sparse Attention with Linear Units☆17Updated 3 years ago
- Official repository for Reliable Label Bootstrapping☆19Updated last year
- My implementation of the gMLP model from the paper "Pay Attention to MLPs".☆24Updated 3 years ago
- ☆23Updated 4 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 4 years ago
- ☆33Updated 3 years ago
- ICLR 2021 (spotlight): Graph Convolution with Low-rank Learnable Local Filters☆15Updated 4 years ago
- Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms☆20Updated 3 years ago
- [EMNLP'19] Summary for Transformer Understanding☆53Updated 5 years ago
- PyTorch implementation of Pay Attention to MLPs☆40Updated 3 years ago
- Improving generalization by controlling label-noise information in neural network weights.☆40Updated 4 years ago
- NeurIPS'19: Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting (Pytorch implementation for class imbalance).☆33Updated 5 years ago
- For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》☆28Updated 4 years ago
- A PyTorch implementation of the Compact Multi-Head Self-Attention Mechanism from the paper: "Low Rank Factorization for Compact Multi-Hea…☆24Updated 5 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 4 years ago
- Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch☆70Updated 4 years ago
- ☆39Updated 4 years ago
- A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )☆23Updated 5 years ago
- A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"☆72Updated 2 years ago