laiguokun / Funnel-TransformerLinks

☆218

Alternatives and similar repositories for Funnel-Transformer

Users that are interested in Funnel-Transformer are comparing it to the libraries listed below

Sorting:

LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆329Updated 3 years ago
guolinke / TUPE
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…
☆251Updated 3 years ago
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
facebookresearch / unlikelihood_training
Neural Text Generation with Unlikelihood Training
☆309Updated 3 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆172Updated 5 years ago
XuezheMax / flowseq
Generative Flow based Sequence-to-Sequence Toolkit written in Python.
☆245Updated 5 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆152Updated 2 years ago
ofirpress / sandwich_transformer
This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer …
☆55Updated 4 years ago
zomux / lanmt
LaNMT: Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference
☆80Updated 3 years ago
lucidrains / marge-pytorch
Implementation of Marge, Pre-training via Paraphrasing, in Pytorch
☆76Updated 4 years ago
yzh119 / BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆128Updated 4 years ago
tnq177 / transformers_without_tears
Transformers without Tears: Improving the Normalization of Self-Attention
☆132Updated last year
graykode / ALBERT-Pytorch
Pytorch Implementation of ALBERT(A Lite BERT for Self-supervised Learning of Language Representations)
☆226Updated 4 years ago
bloodwass / mixout
Implementation of Mixout with PyTorch
☆75Updated 2 years ago
asappresearch / revisit-bert-finetuning
For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).
☆184Updated 2 years ago
MC-BERT / MC-BERT
☆96Updated 5 years ago
facebookresearch / Mask-Predict
A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…
☆244Updated 3 years ago
serrano-s / attn-tests
Checking the interpretability of attention on text classification models
☆49Updated 6 years ago
ofirpress / shortformer
Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.
☆147Updated 4 years ago
richarddwang / electra_pytorch
Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
☆330Updated last year
cindyxinyiwang / deep-latent-sequence-model
Pytorch implementation of "A Probabilistic Formulation of Unsupervised Text Style Transfer" by He. et. al. at ICLR 2020
☆163Updated 2 years ago
zbloss / reformer_lm
a Pytorch implementation of the Reformer Network (https://openreview.net/pdf?id=rkgNKkHtvB)
☆53Updated 2 years ago
nyu-dl / bert-gen
☆323Updated 2 years ago
harvardnlp / encoder-agnostic-adaptation
Encoder-Agnostic Adaptation for Conditional Language Generation
☆79Updated last year
clovaai / length-adaptive-transformer
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)
☆101Updated 4 years ago
uds-lsv / bert-stable-fine-tuning
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
☆136Updated last year
facebookresearch / simmc
With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduc…
☆133Updated last year
google-deepmind / lamb
LAnguage Modelling Benchmarks
☆138Updated 5 years ago
facebookresearch / SentAugment
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆361Updated 3 years ago