yzh119 / BPTLinks

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

☆128

Alternatives and similar repositories for BPT

Users that are interested in BPT are comparing it to the libraries listed below

Sorting:

gonglinyuan / StackingBERT
Source code for "Efficient Training of BERT by Progressively Stacking"
☆113Updated 6 years ago
zhuohan123 / macaron-net
Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
☆148Updated 6 years ago
lancopku / Prime
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
☆86Updated 2 years ago
microsoft / Unicoder
Unicoder model for understanding and generation.
☆92Updated last year
RayeRen / multilingual-kd-pytorch
ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
☆70Updated 5 years ago
fuzihaofzh / repetition-problem-nlg
Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.
☆56Updated 3 years ago
guolinke / TUPE
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…
☆252Updated 4 years ago
lemmonation / jm-nat
Code for ACL2020 "Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation"
☆39Updated 5 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆173Updated 5 years ago
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
microsoft / DualLearning
A dual learning toolkit developed by Microsoft Research
☆72Updated 2 years ago
FreedomIntelligence / complex-order
☆84Updated 6 years ago
microsoft / Transformer-XH
☆70Updated 5 years ago
Sanyuan-Chen / RecAdam
Code for the RecAdam paper: Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting.
☆118Updated 5 years ago
serrano-s / attn-tests
Checking the interpretability of attention on text classification models
☆49Updated 6 years ago
MC-BERT / MC-BERT
☆96Updated 5 years ago
dojoteef / synst
Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"
☆81Updated 3 years ago
microsoft / EA-VQ-VAE
This repo provides the code for the ACL 2020 paper "Evidence-Aware Inferential Text Generation with Vector Quantised Variational AutoEnco…
☆55Updated 4 years ago
eaglenlp / Text-Generation
☆94Updated 5 years ago
wellecks / nonmonotonic_text
Non-Monotonic Sequential Text Generation (ICML 2019)
☆72Updated 6 years ago
laiguokun / Funnel-Transformer
☆219Updated 5 years ago
intersun / CoDIR
Code for EMNLP 2020 paper CoDIR
☆41Updated 3 years ago
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
hengyicai / ContrastiveLearning4Dialogue
The codebase for "Group-wise Contrastive Learning for Neural Dialogue Generation" (Cai et al., Findings of EMNLP 2020)
☆55Updated 4 years ago
zhaocq-nlp / Attention-Visualization
Visualization for simple attention and Google's multi-head attention.
☆68Updated 7 years ago
jxhe / self-training-text-generation
Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"
☆46Updated 3 years ago
dguo98 / SeqMix
Sequence-Level Mixed Sample Data Augmentation
☆22Updated 4 years ago
hfxunlp / transformer
Neutron: A pytorch based implementation of Transformer and its variants.
☆64Updated 2 years ago
Noahs-ARK / MAE
☆21Updated 5 years ago