XuezheMax / fairseq-apolloLinks

FairSeq repo with Apollo optimizer

☆114

Alternatives and similar repositories for fairseq-apollo

Users that are interested in fairseq-apollo are comparing it to the libraries listed below

Sorting:

jungokasai / deep-shallow
☆44Updated 5 years ago
bigscience-workshop / architecture-objective
☆98Updated 2 years ago
kernelmachine / demix
DEMix Layers for Modular Language Modeling
☆54Updated 4 years ago
princeton-nlp / DinkyTrain
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
☆114Updated 2 years ago
martiansideofthemoon / rankgen
Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…
☆138Updated 2 years ago
clovaai / length-adaptive-transformer
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)
☆102Updated 4 years ago
qqaatw / pytorch-realm-orqa
PyTorch reimplementation of REALM and ORQA
☆22Updated 3 years ago
jxhe / efficient-knnlm
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)
☆74Updated 3 years ago
tanyuqian / ctc-gen-eval
EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation
☆98Updated 2 years ago
IBM / PoWER-BERT
Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…
☆62Updated last month
NAR-tutorial / acl2022
☆99Updated 3 years ago
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆64Updated 2 years ago
yoonkim / neural-qcfg
☆45Updated 4 years ago
pmichel31415 / are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
☆172Updated 5 years ago
nng555 / ssmba
☆62Updated 3 years ago
fuzihaofzh / repetition-problem-nlg
Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.
☆55Updated 2 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆123Updated 4 years ago
tnq177 / transformers_without_tears
Transformers without Tears: Improving the Normalization of Self-Attention
☆133Updated last year
lucidrains / marge-pytorch
Implementation of Marge, Pre-training via Paraphrasing, in Pytorch
☆76Updated 4 years ago
CAMTL / CA-MTL
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
☆57Updated 4 years ago
rosewang2008 / language_modeling_via_stochastic_processes
Language modeling via stochastic processes. Oral @ ICLR 2022.
☆138Updated 2 years ago
machelreid / diffuser
DiffusER: Discrete Diffusion via Edit-based Reconstruction (Reid, Hellendoorn & Neubig, 2022)
☆54Updated 2 months ago
microsoft / EfficientLongSequenceModeling
☆51Updated 2 years ago
vklabmipt / implicit-unlikelihood-training
Improving Neural Text Generation with Reinforcement Learning
☆22Updated 4 years ago
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
bloodwass / mixout
Implementation of Mixout with PyTorch
☆75Updated 2 years ago
tau-nlp / scrolls
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
☆69Updated last year
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
cliang1453 / SAGE
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
☆29Updated 3 years ago