facebookresearch / adaptive-spanLinks

Transformer training code for sequential tasks

☆612

Alternatives and similar repositories for adaptive-span

Users that are interested in adaptive-span are comparing it to the libraries listed below

Sorting:

asyml / texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CAS…
☆745Updated 3 years ago
harvardnlp / pytorch-struct
Fast, general, and tested differentiable structured prediction in PyTorch
☆1,114Updated 3 years ago
Smerity / sha-rnn
Single Headed Attention RNN - "Stop thinking with your head"
☆1,182Updated 3 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆329Updated 3 years ago
elbayadm / attn2d
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction
☆502Updated 4 years ago
andreamad8 / Universal-Transformer-Pytorch
Implementation of Universal Transformer in Pytorch
☆261Updated 6 years ago
zihangdai / mos
☆395Updated 6 years ago
yikangshen / Ordered-Neurons
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"
☆580Updated 5 years ago
cybertronai / pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
☆376Updated 4 years ago
harvardnlp / var-attn
Latent Alignment and Variational Attention
☆327Updated 6 years ago
ofirpress / YouMayNotNeedAttention
Code for the Eager Translation Model from the paper You May Not Need Attention
☆295Updated 6 years ago
laiguokun / Funnel-Transformer
☆218Updated 5 years ago
graykode / xlnet-Pytorch
Simple XLNet implementation with Pytorch Wrapper
☆581Updated 6 years ago
microsoft / fastformers
FastFormers - highly efficient transformer models for NLU
☆705Updated 3 months ago
nyu-dl / bert-gen
☆323Updated 2 years ago
alexa / bort
Repository for the paper "Optimal Subarchitecture Extraction for BERT"
☆473Updated 3 years ago
eladhoffer / seq2seq.pytorch
Sequence-to-Sequence learning using PyTorch
☆520Updated 5 years ago
sacmehta / delight
DeLighT: Very Deep and Light-Weight Transformers
☆469Updated 4 years ago
graykode / ALBERT-Pytorch
Pytorch Implementation of ALBERT(A Lite BERT for Self-supervised Learning of Language Representations)
☆226Updated 4 years ago
facebookresearch / unlikelihood_training
Neural Text Generation with Unlikelihood Training
☆309Updated 3 years ago
successar / AttentionExplanation
☆315Updated 3 years ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆297Updated 3 years ago
Maluuba / gensen
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
☆311Updated 4 years ago
cybertronai / transformer-xl
Training Transformer-XL on 128 GPUs
☆140Updated 5 years ago
deep-spin / entmax
The entmax mapping and its loss, a family of sparse softmax alternatives.
☆441Updated last year
huggingface / naacl_transfer_learning_tutorial
Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA
☆723Updated 5 years ago
huggingface / hmtl
🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP
☆1,196Updated last year
DSE-MSU / R-transformer
Pytorch implementation of R-Transformer. Some parts of the code are adapted from the implementation of TCN and Transformer.
☆230Updated 6 years ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆416Updated 2 years ago
clarkkev / attention-analysis
☆467Updated 4 years ago