yzh119 / BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆126Updated 3 years ago
Alternatives and similar repositories for BPT:
Users that are interested in BPT are comparing it to the libraries listed below
- Source code for "Efficient Training of BERT by Progressively Stacking"☆112Updated 5 years ago
- ☆69Updated 4 years ago
- Code for the RecAdam paper: Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting.☆115Updated 4 years ago
- Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…☆250Updated 3 years ago
- A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.☆86Updated last year
- Checking the interpretability of attention on text classification models☆47Updated 5 years ago
- Non-Monotonic Sequential Text Generation (ICML 2019)☆72Updated 5 years ago
- Reproduce the results of paper "Compressing Word Embeddings via Deep Compositional Code Learning" accepted ICLR 2018☆23Updated 6 years ago
- Graph to sequence implemented in Pytorch combining Graph convolutional networks and opennmt-py☆151Updated 5 years ago
- Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"☆148Updated 5 years ago
- ☆74Updated 2 years ago
- ☆96Updated 4 years ago
- Non-autoregressive Neural Machine Translation (not a full version)☆71Updated 2 years ago
- Code for "Graph-to-Sequence Learning using Gated Graph Neural Networks"☆123Updated 4 years ago
- PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset☆122Updated 5 years ago
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆51Updated 2 years ago
- ☆83Updated 5 years ago
- ☆92Updated 4 years ago
- In this project we develop new deep learning models for bootstrapping language understanding models for languages with no labeled data us…☆77Updated 2 years ago
- Neural Module Network for Reasoning over Text, ICLR 2020☆121Updated 4 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 3 years ago
- Visualization for simple attention and Google's multi-head attention.☆68Updated 6 years ago
- Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning (authors' MXNet implementation for the TACL19 paper)☆78Updated 3 years ago
- Code for ACL2021 paper: "GLGE: A New General Language Generation Evaluation Benchmark"☆58Updated 2 years ago
- ☆213Updated 4 years ago
- Code for NIPS 2018 paper 'Frequency-Agnostic Word Representation'☆117Updated 5 years ago
- For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).☆184Updated last year
- Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"☆46Updated 2 years ago
- ☆94Updated 3 years ago
- Submission to ICLR☆46Updated last year