ofirpress / sandwich_transformerLinks

This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

☆55

Alternatives and similar repositories for sandwich_transformer

Users that are interested in sandwich_transformer are comparing it to the libraries listed below

Sorting:

UriSha / EmbeddinglessNMT
The implementation of "Neural Machine Translation without Embeddings", NAACL 2021
☆33Updated 4 years ago
allenai / sledgehammer
☆48Updated 5 years ago
nng555 / ssmba
☆62Updated 3 years ago
cambridgeltl / parameter-factorization
Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer
☆39Updated 5 years ago
namisan / exdeep-nmt
☆32Updated 4 years ago
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
TimDettmers / transformer-xl
☆64Updated 5 years ago
zbloss / reformer_lm
a Pytorch implementation of the Reformer Network (https://openreview.net/pdf?id=rkgNKkHtvB)
☆53Updated 2 years ago
carolinlawrence / BiSon
Code for bidirectional sequence generation (BiSon) for generating from BERT pre-trained models.
☆51Updated 5 years ago
fallcat / stupidNMT
Hard-Coded Gaussian Attention for Neural Machine Translation
☆36Updated 2 years ago
efficientqa / nq-open
☆31Updated 5 years ago
neulab / lrlm
Code for the paper "Latent Relation Language Models" at AAAI-20.
☆41Updated last month
MultiPath / Efficient-Neural-Machine-Translation
PhD thesis (updating) of Jiatao Gu from HKU
☆19Updated 7 years ago
nttcslab-nlp / doc_lm
☆12Updated 6 years ago
intersun / CoDIR
Code for EMNLP 2020 paper CoDIR
☆41Updated 3 years ago
leo-liuzy / probe-across-time
☆22Updated 4 years ago
seilna / CNN-Units-in-NLP
Repository for our ICLR 2019 paper: Discovery of Natural Language Concepts in Individual Units of CNNs
☆26Updated 6 years ago
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
lucidrains / marge-pytorch
Implementation of Marge, Pre-training via Paraphrasing, in Pytorch
☆76Updated 4 years ago
yandex-research / graph-glove
PyTorch code for the EMNLP 2020 paper "Embedding Words in Non-Vector Space with Unsupervised Graph Learning"
☆41Updated 4 years ago
jungokasai / deep-shallow
☆44Updated 5 years ago
vid-koci / bert-commonsense
Code for papers "A Surprisingly Robust Trick for Winograd Schema Challenge" and "WikiCREM: A Large Unsupervised Corpus for Coreference Re…
☆71Updated 3 years ago
facebookresearch / QA-Overlap
Code to support the paper "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"
☆66Updated 4 years ago
gsarti / lambda-bert
A 🤗-style implementation of BERT using lambda layers instead of self-attention
☆69Updated 5 years ago
clovaai / length-adaptive-transformer
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)
☆102Updated 5 years ago
JunjieHu / amber
Explicit Alignment Objectives for Multilingual Bidirectional Encoders
☆14Updated 4 years ago
shoarora / transformers-trainers
Tools for training pytorch language models
☆27Updated 5 years ago
seraphlabs-ca / SentenceMIM-demo
This repo contains code to reproduce some of the results presented in the paper "SentenceMIM: A Latent Variable Language Model"
☆28Updated 3 years ago
allenai / tpu_pretrain
LM Pretraining with PyTorch/TPU
☆136Updated 6 years ago
mandarjoshi90 / pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
☆61Updated 2 years ago