gonglinyuan / StackingBERTLinks

Source code for "Efficient Training of BERT by Progressively Stacking"

☆113

Alternatives and similar repositories for StackingBERT

Users that are interested in StackingBERT are comparing it to the libraries listed below

Sorting:

MultiPath / Squirrel
PyTorch implementation of Transformer-based Neural Machine Translation
☆78Updated 2 years ago
microsoft / DualLearning
A dual learning toolkit developed by Microsoft Research
☆71Updated 2 years ago
ExplorerFreda / TreeEnc
[EMNLP 2018] On Tree-Based Neural Sentence Modeling.
☆64Updated 6 years ago
nyu-dl / dl4mt-nonauto
☆119Updated 6 years ago
bzhangGo / transformer-aan
souce code for "Accelerating Neural Transformer via an Average Attention Network"
☆78Updated 6 years ago
yzh119 / BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆128Updated 4 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆172Updated 5 years ago
wellecks / nonmonotonic_text
Non-Monotonic Sequential Text Generation (ICML 2019)
☆72Updated 6 years ago
LiyuanLucasLiu / Torch-Scope
A Toolkit for Training, Tracking, Saving Models and Syncing Results
☆62Updated 5 years ago
pramodkaushik / acl18_results
Code to reproduce results in our ACL 2018 paper "Did the Model Understand the Question?"
☆33Updated 7 years ago
nyu-dl / dl4mt-seqgen
☆31Updated 6 years ago
MultiPath / NA-NMT
Non-autoregressive Neural Machine Translation (not a full version)
☆70Updated 2 years ago
pytorch-tpu / fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆22Updated 2 years ago
vanzytay / NIPS2018_DECAPROP
Implementation of Densely Connected Attention Propagation for Reading Comprehension (NIPS 2018)
☆69Updated 6 years ago
mingdachen / disentangle-semantics-syntax
Code for "A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations" (NAACL 2019)
☆67Updated 4 years ago
zhuohan123 / macaron-net
Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
☆148Updated 6 years ago
ZhitingHu / text_content_manipulation
Text Content Manipulation
☆45Updated 4 years ago
robinjia / adversarial-squad
Code from Jia and Liang, "Adversarial Examples for Evaluating Reading Comprehension Systems" (EMNLP 2017)
☆118Updated 7 years ago
lancopku / Prime
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
☆86Updated 2 years ago
HA-Transformer / MAT
The implementation of multi-branch attentive Transformer (MAT).
☆33Updated 5 years ago
kimiyoung / glomo
Unsupervised Learning of Transferable Relational Graphs
☆69Updated 6 years ago
SkyAndCloud / awesome-transformer
This repo is not maintained. For latest version, please visit https://github.com/ictnlp. A collection of transformer's guides, implementa…
☆44Updated 6 years ago
jihunchoi / unsupervised-treelstm
☆121Updated 7 years ago
FranxYao / Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
☆57Updated 4 years ago
hhexiy / debiased
Learn models that are robust to spurious correlations in the dataset.
☆26Updated 5 years ago
microsoft / DynSP
Search-based-Neural-Structured-Learning-for-Sequential-Question-Answering
☆32Updated 2 years ago
dojoteef / synst
Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"
☆81Updated 3 years ago
Noahs-ARK / MAE
☆21Updated 5 years ago
galsang / BiBloSA-pytorch
Re-implementation of Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling (T. Shen et al., ICLR 2018) on P…
☆42Updated 7 years ago
LaraQianYang / Ouroboros
Ouroboros: On Accelerating Training of Transformer-Based Language Models
☆10Updated 5 years ago