guolinke / TUPELinks

Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.

☆252

Alternatives and similar repositories for TUPE

Users that are interested in TUPE are comparing it to the libraries listed below

Sorting:

yzh119 / BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆128Updated 4 years ago
laiguokun / Funnel-Transformer
☆219Updated 5 years ago
facebookresearch / Mask-Predict
A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…
☆245Updated 4 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆330Updated 3 years ago
asappresearch / revisit-bert-finetuning
For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).
☆184Updated 2 years ago
JetRunner / BERT-of-Theseus
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
☆315Updated 2 years ago
Sanyuan-Chen / RecAdam
Code for the RecAdam paper: Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting.
☆118Updated 4 years ago
FreedomIntelligence / complex-order
☆84Updated 5 years ago
lonePatient / electra_pytorch
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
☆91Updated 4 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆172Updated 5 years ago
yitu-opensource / ConvBert
☆254Updated 3 years ago
yaushian / Tree-Transformer
Implementation of the paper Tree Transformer
☆214Updated 5 years ago
XuezheMax / flowseq
Generative Flow based Sequence-to-Sequence Toolkit written in Python.
☆246Updated 5 years ago
budzianowski / PyTorch-Beam-Search-Decoding
PyTorch implementation of beam search decoding for seq2seq models
☆339Updated 2 years ago
lemmonation / jm-nat
Code for ACL2020 "Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation"
☆39Updated 5 years ago
graykode / ALBERT-Pytorch
Pytorch Implementation of ALBERT(A Lite BERT for Self-supervised Learning of Language Representations)
☆227Updated 4 years ago
kahne / NonAutoregGenProgress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
☆305Updated 2 years ago
ZhengZixiang / ATPapers
Worth-reading papers and related resources on attention mechanism, Transformer and pretrained language model (PLM) such as BERT. 值得一读的注意力…
☆130Updated 4 years ago
microsoft / EA-VQ-VAE
This repo provides the code for the ACL 2020 paper "Evidence-Aware Inferential Text Generation with Vector Quantised Variational AutoEnco…
☆55Updated 4 years ago
salesforce / nonauto-nmt
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"
☆271Updated 3 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
yistLin / pytorch-dual-learning
Implementation of Dual Learning NMT on PyTorch
☆163Updated 7 years ago
zhuohan123 / macaron-net
Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
☆148Updated 6 years ago
ChunyuanLI / Optimus
Optimus: the first large-scale pre-trained VAE language model
☆391Updated 2 years ago
fastnlp / style-transformer
☆179Updated 3 years ago
jcyk / gtos
Code for AAAI2020 paper "Graph Transformer for Graph-to-Sequence Learning"
☆190Updated last year
ChenRocks / Distill-BERT-Textgen
Research code for ACL 2020 paper: "Distilling Knowledge Learned in BERT for Text Generation".
☆129Updated 4 years ago
AsaCooperStickland / Bert-n-Pals
Pytorch implementation of Bert and Pals: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (https://arxiv.org/ab…
☆84Updated 6 years ago
microsoft / COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
☆117Updated 2 years ago
MC-BERT / MC-BERT
☆97Updated 5 years ago