layer6ai-labs / T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
89Updated 3 years ago

Related projects: