lancopku / AdaNormLinks

Code for "Understanding and Improving Layer Normalization"

☆46

Alternatives and similar repositories for AdaNorm

Users that are interested in AdaNorm are comparing it to the libraries listed below

Sorting:

10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
zhuohan123 / macaron-net
Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
☆148Updated 6 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆101Updated 4 years ago
jackyyy0228 / Order-free-Learning-Alleviating-Exposure-Bias-in-Multi-label-Classification
☆20Updated 5 years ago
facebookresearch / DisCo
DisCo Transformer for Non-autoregressive MT
☆77Updated 3 years ago
lancopku / Prime
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
☆86Updated 2 years ago
szhangtju / The-compression-of-Transformer
☆64Updated 4 years ago
lioutasb / TaLKConvolutions
Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)
☆29Updated 4 years ago
RayeRen / multilingual-kd-pytorch
ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
☆70Updated 5 years ago
yaohungt / TransformerDissection
[EMNLP'19] Summary for Transformer Understanding
☆53Updated 5 years ago
zlinao / Variational-Transformer
Variational Transformers for Diverse Response Generation
☆81Updated last year
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆151Updated 2 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
harvardnlp / cascaded-generation
Cascaded Text Generation with Markov Transformers
☆129Updated 2 years ago
iedwardwangi / MetaAdapter
☆22Updated 4 years ago
KrisKorrel / sparsemax-pytorch
Implementation of Sparsemax activation in Pytorch
☆164Updated 5 years ago
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
jaywalnut310 / Vector-Quantized-Autoencoders
Tensorflow Implementation of "Theory and Experiments on Vector Quantized Autoencoders"
☆14Updated 6 years ago
linzehui / Curriculum-Learning-PaperList-Materials
Curriculum Learning related papers and materials
☆53Updated 4 years ago
HA-Transformer / MAT
The implementation of multi-branch attentive Transformer (MAT).
☆33Updated 5 years ago
FreedomIntelligence / complex-order
☆84Updated 5 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
lancopku / Explicit-Sparse-Transformer
code for Explicit Sparse Transformer
☆61Updated 2 years ago
Junya-Chen / FlatCLR
FlatNCE: A Novel Contrastive Representation Learning Objective
☆90Updated 3 years ago
rabeehk / vibert
Implementation for Variational Information Bottleneck for Effective Low-resource Fine-tuning, ICLR 2021
☆41Updated 4 years ago
lemmonation / jm-nat
Code for ACL2020 "Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation"
☆39Updated 5 years ago
yzh119 / BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
☆128Updated 4 years ago
jack57lee / Diversify-MHA
EMNLP 2018: Multi-Head Attention with Disagreement Regularization; NAACL 2019: Information Aggregation for Multi-Head Attention with Rout…
☆21Updated 5 years ago
Edward-Sun / structured-nart
☆15Updated 5 years ago
ItzikMalkiel / MTAdam
MTAdam: Automatic Balancing of Multiple Training Loss Terms
☆36Updated 4 years ago