CyberZHG / torch-multi-head-attention
Multi-head attention in PyTorch
☆149Updated 5 years ago
Alternatives and similar repositories for torch-multi-head-attention:
Users that are interested in torch-multi-head-attention are comparing it to the libraries listed below
- Implement the paper "Self-Attention with Relative Position Representations"☆125Updated 4 years ago
- ☆83Updated 5 years ago
- PyTorch implementation of Representation Learning with Contrastive Predictive Coding by Van den Oord et al. (2018)☆83Updated 3 years ago
- Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch☆70Updated 4 years ago
- star_transformer pytorch☆27Updated 5 years ago
- Experiments with supervised contrastive learning methods with different loss functions☆218Updated 2 years ago
- pytorch implementation of Attention is all you need☆240Updated 3 years ago
- PCGrad pytorch sample code [not official]☆30Updated 4 years ago
- ☆64Updated 5 years ago
- Learning deep representations by mutual information estimation and maximization☆324Updated 6 years ago
- PyTorch implementation of Pay Attention to MLPs☆40Updated 3 years ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆185Updated 2 years ago
- Transformer/Transformer-XL/R-Transformer examples and explanations☆26Updated 3 years ago
- Pytorch implementation of R-Transformer. Some parts of the code are adapted from the implementation of TCN and Transformer.☆226Updated 5 years ago
- code for Explicit Sparse Transformer☆57Updated last year
- Implementation of RealFormer using pytorch☆101Updated 4 years ago
- Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms☆256Updated 3 years ago
- For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》☆29Updated 4 years ago
- PyTorch implementation of a Variational Autoencoder with Gumbel-Softmax Distribution☆204Updated 6 years ago
- Minimal RNN classifier with self-attention in Pytorch☆150Updated 3 years ago
- This repository contain various types of attention mechanism like Bahdanau , Soft attention , Additive Attention , Hierarchical Attention…☆125Updated 3 years ago
- A minimal pytorch package implementing a gradient reversal layer.☆157Updated 2 months ago
- Code for "Understanding and Improving Layer Normalization"☆46Updated 5 years ago
- ☆32Updated 4 years ago
- an implementation of Deep Variational Informational Bottleneck in pytorch (https://arxiv.org/pdf/1612.00410.pdf)☆33Updated 6 years ago
- A pytorch implementation of the paper: "Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks"☆81Updated 6 years ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆357Updated last year
- pytorch neural network attention mechanism☆148Updated 5 years ago
- A pytorch &keras implementation and demo of Fastformer.☆188Updated 2 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆72Updated 4 years ago