layer6ai-labs/T-Fixup

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/layer6ai-labs/T-Fixup)

layer6ai-labs / T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"

☆90

Alternatives and similar repositories for T-Fixup

Users that are interested in T-Fixup are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LiyuanLucasLiu / Transformer-Clinic
View on GitHub
Understanding the Difficulty of Training Transformers
☆332May 31, 2022Updated 4 years ago
zhengzx-nlp / REDER
View on GitHub
[NeurIPS 2021] Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
☆15Jun 7, 2022Updated 4 years ago
sIncerass / powernorm
View on GitHub
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Jun 20, 2021Updated 5 years ago
libeineu / SDT-Training
View on GitHub
The implementation of "Shallow-to-Deep Training for Neural Machine Translation"
☆10Oct 26, 2020Updated 5 years ago
ictnlp / DiverseNMT
View on GitHub
Source code for the AAAI 2020 long paper <Modeling Fluency and Faithfulness for Diverse Neural Machine Translation>.
☆19Mar 10, 2020Updated 6 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ictnlp / OR-NMT
View on GitHub
Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>
☆41Nov 10, 2020Updated 5 years ago
howardchenhd / Syntax-awared-NMT
View on GitHub
Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder
☆42Nov 22, 2017Updated 8 years ago
ictnlp / RSI-NAT
View on GitHub
Source code for "Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation"
☆18Aug 31, 2019Updated 6 years ago
harvardnlp / cascaded-generation
View on GitHub
Cascaded Text Generation with Markov Transformers
☆130Mar 20, 2023Updated 3 years ago
zhuchen03 / gradinit
View on GitHub
Learning to Initialize Neural Networks for Stable and Efficient Training
☆138May 24, 2022Updated 4 years ago
lucidrains / all-normalization-transformer
View on GitHub
A simple Transformer where the softmax has been replaced with normalization
☆20Sep 11, 2020Updated 5 years ago
wangqiangneu / dlcl
View on GitHub
The implementation of "Learning Deep Transformer Models for Machine Translation"
☆116Jul 25, 2024Updated 2 years ago
shawnkx / Fully-NAT
View on GitHub
☆17Jul 5, 2022Updated 4 years ago
mlpc-ucsd / BERT_Convolutions
View on GitHub
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.
☆21Jul 13, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
bicici / FDA
View on GitHub
Feature Decay Algorithms
☆11Mar 5, 2014Updated 12 years ago
zomux / lanmt
View on GitHub
LaNMT: Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference
☆80Aug 27, 2021Updated 4 years ago
mit-han-lab / lite-transformer
View on GitHub
[ICLR 2020] Lite Transformer with Long-Short Range Attention
☆609Jul 11, 2024Updated 2 years ago
zhuohan123 / hint-nart
View on GitHub
☆10Feb 12, 2020Updated 6 years ago
danijar / teleport
View on GitHub
Efficiently send large arrays across machines
☆15Jul 24, 2024Updated 2 years ago
thompsonb / fairseq-smrt
View on GitHub
Code for "Simulated Multiple Reference Training Improves Low-Resource Machine Translation"
☆15Dec 1, 2020Updated 5 years ago
bzhangGo / transformer-aan
View on GitHub
souce code for "Accelerating Neural Transformer via an Average Attention Network"
☆78Jul 3, 2019Updated 7 years ago
sustcsonglin / second-order-neural-dmv
View on GitHub
source code of COLING2020 "Second-Order Unsupervised Neural Dependency Parsing"
☆16Oct 24, 2022Updated 3 years ago
LIJUNYI95 / SuperAdam
View on GitHub
Official Pytorch Implementation for the paper 'SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients'
☆17Jan 12, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nyu-dl / dl4mt-multi-src
View on GitHub
☆19Mar 15, 2017Updated 9 years ago
fedden / TensorFlow-Efficient-Neural-Audio-Synthesis
View on GitHub
☆20Feb 27, 2018Updated 8 years ago
microsoft / BANG
View on GitHub
BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generat…
☆28Feb 6, 2022Updated 4 years ago
libeineu / Context-Aware
View on GitHub
The implementation of "Does Multi-Encoder Help? A Case Study on Context-AwareNeural Machine Translation"
☆39Aug 26, 2020Updated 5 years ago
fandongmeng / DTMT_InDec
View on GitHub
Implementation of DTMT with incremental decoding
☆13Feb 20, 2021Updated 5 years ago
vyraun / long-tailed
View on GitHub
Code for "On Long-Tailed Phenomena in NMT".
☆10Jan 10, 2021Updated 5 years ago
nlp-compromise / penn-treebank
View on GitHub
a small, non-commercial, fair-use subset of the Penn-Treebank, in JSON.
☆17Apr 10, 2018Updated 8 years ago
dojoteef / synst
View on GitHub
Source code to reproduce the results in the ACL 2019 paper "Syntactically Supervised Transformers for Faster Neural Machine Translation"
☆80Oct 6, 2022Updated 3 years ago
libeineu / UMST
View on GitHub
☆11Jun 1, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
coastalcph / seq2sparql
View on GitHub
Multilingual Compositional Wikidata Questions (MCWQ)
☆20Jun 12, 2023Updated 3 years ago
hsing-wang / WMT2020_BioMedical
View on GitHub
☆15Jul 16, 2021Updated 5 years ago
microsoft / MPNet
View on GitHub
MPNet: Masked and Permuted Pre-training for Language Understanding https://arxiv.org/pdf/2004.09297.pdf
☆300Sep 11, 2021Updated 4 years ago
emorynlp / elit
View on GitHub
Emory Language and Information Toolkit
☆39Apr 16, 2025Updated last year
revsic / torch-retriever-vc
View on GitHub
PyTorch implementation of Retriever: Learning Content-Style Representation
☆12Jan 27, 2023Updated 3 years ago
lucidrains / reformer-pytorch
View on GitHub
Reformer, the efficient Transformer, in Pytorch
☆2,190Jun 21, 2023Updated 3 years ago
MultiPath / Efficient-Neural-Machine-Translation
View on GitHub
PhD thesis (updating) of Jiatao Gu from HKU
☆19Aug 10, 2018Updated 7 years ago