TensorUI / relative-position-pytorchLinks

a pytorch implementation of self-attention with relative position representations

☆50

Alternatives and similar repositories for relative-position-pytorch

Users that are interested in relative-position-pytorch are comparing it to the libraries listed below

Sorting:

yaushian / Tree-Transformer
Implementation of the paper Tree Transformer
☆214Updated 5 years ago
FreedomIntelligence / complex-order
☆83Updated 5 years ago
RayeRen / multilingual-kd-pytorch
ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
☆70Updated 4 years ago
lemmonation / jm-nat
Code for ACL2020 "Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation"
☆39Updated 5 years ago
cooelf / UVR-NMT
Neural Machine Translation with universal Visual Representation (ICLR 2020)
☆89Updated 5 years ago
microsoft / Unicoder
Unicoder model for understanding and generation.
☆91Updated last year
WHUIR / PPVAE
The official Keras implementation of ACL 2020 paper "Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-En…
☆48Updated 2 years ago
shehzaadzd / pytorch-pretrained-BERT
A PyTorch implementation of Google AI's BERT model provided with Google's pre-trained models, examples and utilities.
☆35Updated 6 years ago
ChenRocks / Distill-BERT-Textgen
Research code for ACL 2020 paper: "Distilling Knowledge Learned in BERT for Text Generation".
☆131Updated 4 years ago
asappresearch / revisit-bert-finetuning
For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).
☆184Updated 2 years ago
eaglenlp / Text-Generation
☆93Updated 5 years ago
312shan / Pytorch-seq2seq-Beam-Search
PyTorch implementation for Seq2Seq model with attention and Greedy Search / Beam Search for neural machine translation
☆58Updated 4 years ago
Sanyuan-Chen / RecAdam
Code for the RecAdam paper: Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting.
☆118Updated 4 years ago
Cartus / DCGCN
Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning (authors' MXNet implementation for the TACL19 paper)
☆78Updated 4 years ago
yzh119 / BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆128Updated 4 years ago
lonePatient / electra_pytorch
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
☆91Updated 3 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
travel-go / Abstractive-Text-Summarization
Contrastive Attention Mechanism for Abstractive Text Summarization
☆40Updated 5 years ago
mourga / affective-attention
Source code for the ACL 2019 paper "Attention-based Conditioning Methods for External Knowledge Integration"
☆59Updated 3 years ago
henryhungle / MTN
Code for the paper Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems (ACL19)
☆100Updated 2 years ago
ranqiu92 / RecoverSAT
☆18Updated last year
AsaCooperStickland / Bert-n-Pals
Pytorch implementation of Bert and Pals: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (https://arxiv.org/ab…
☆83Updated 6 years ago
ictnlp / OR-NMT
Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>
☆41Updated 4 years ago
hongwang600 / Summarization
☆38Updated 6 years ago
guxd / DialogWAE
Source Code for DialogWAE: Multimodal Response Generation with Conditional Wasserstein Autoencoder (https://arxiv.org/abs/1805.12352)
☆125Updated 6 years ago
microsoft / EA-VQ-VAE
This repo provides the code for the ACL 2020 paper "Evidence-Aware Inferential Text Generation with Vector Quantised Variational AutoEnco…
☆55Updated 4 years ago
yanzhangnlp / IS-BERT
An Unsupervised Sentence Embedding Method by Mutual Information Maximization (EMNLP2020)
☆61Updated 4 years ago
zhaodongh / Encoding-Word-Order-in-Complex-valued-Embedding
The code of Encoding Word Order in Complex-valued Embedding
☆42Updated 6 years ago
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
zhaocq-nlp / Attention-Visualization
Visualization for simple attention and Google's multi-head attention.
☆67Updated 7 years ago