CyberZHG / torch-multi-head-attentionLinks

Multi-head attention in PyTorch

☆154

Alternatives and similar repositories for torch-multi-head-attention

Users that are interested in torch-multi-head-attention are comparing it to the libraries listed below

Sorting:

evelinehong / Transformer_Relative_Position_PyTorch
Implement the paper "Self-Attention with Relative Position Representations"
☆139Updated 4 years ago
brianlan / pytorch-grad-norm
Pytorch implementation of the GradNorm. GradNorm addresses the problem of balancing multiple losses for multi-task learning by learning a…
☆271Updated 3 years ago
hosseinshn / GradNorm
This in my Demo of Chen et al. "GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks" ICML 2018
☆181Updated 3 years ago
wangz10 / contrastive_loss
Experiments with supervised contrastive learning methods with different loss functions
☆221Updated 2 years ago
hosseinshn / Basic-Multi-task-Learning
This is a repository for Multi-task learning with toy data in Pytorch and Tensorflow
☆137Updated 7 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
FreedomIntelligence / complex-order
☆84Updated 5 years ago
oscarkey / multitask-learning
MSc group project: Reproduction of 'Multi-Task Learning using Uncertainty to Weigh Losses for Scene Geometry and Semantics'; A. Kendall, …
☆91Updated 5 years ago
Spijkervet / contrastive-predictive-coding
PyTorch implementation of Representation Learning with Contrastive Predictive Coding by Van den Oord et al. (2018)
☆88Updated 3 years ago
wuch15 / Fastformer
A pytorch &keras implementation and demo of Fastformer.
☆190Updated 3 years ago
rishikksh20 / FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
☆259Updated 4 years ago
JasonZhang156 / awesome-mixed-sample-data-augmentation
A collection of awesome things about mixed sample data augmentation
☆132Updated 5 years ago
kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆75Updated 5 years ago
GuillaumeErhard / Supervised_contrastive_loss_pytorch
Independent implementation of Supervised Contrastive Loss. Straight to the point and beyond
☆84Updated 4 years ago
Hui-Li / multi-task-learning-example-PyTorch
☆148Updated 3 years ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆421Updated 3 years ago
DSE-MSU / R-transformer
Pytorch implementation of R-Transformer. Some parts of the code are adapted from the implementation of TCN and Transformer.
☆230Updated 6 years ago
helloyide / Cross-stitch-Networks-for-Multi-task-Learning
A Tensorflow implementation of the paper arXiv:1604.03539
☆134Updated 7 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
Justin1904 / Low-rank-Multimodal-Fusion
This is the repository for "Efficient Low-rank Multimodal Fusion with Modality-Specific Factors", Liu and Shen, et. al. ACL 2018
☆266Updated 5 years ago
thegregyang / LossUpAccUp
Loss and accuracy go opposite ways...right?
☆95Updated 5 years ago
lzy1732008 / GaussionTransformer
For paper《Gaussian Transformer: A Lightweight Approach for Natural Language Inference》
☆28Updated 5 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆101Updated 4 years ago
RMichaelSwan / MogrifierLSTM
A quick walk-through of the innards of LSTMs and a naive implementation of the Mogrifier LSTM paper in PyTorch
☆78Updated 5 years ago
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 2 years ago
Stonesjtu / Pytorch-NCE
The Noise Contrastive Estimation for softmax output written in Pytorch
☆319Updated 5 years ago
xuanqing94 / FLOATER
Learning to Encode Position for Transformer with Continuous Dynamical Model
☆59Updated 5 years ago
choosewhatulike / sparse-sharing
Codes for "Learning Sparse Sharing Architectures for Multiple Tasks"
☆95Updated 5 years ago
jaketae / g-mlp
PyTorch implementation of Pay Attention to MLPs
☆41Updated 4 years ago
zy1996code / nlp_basic_model
some basic deep learning models/method for nlp, text classification.
☆79Updated 5 years ago