ysw1021 / AGG

A Pytorch implementation of "Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings"
9Updated 2 years ago

Related projects: