deep-spin / entmax
The entmax mapping and its loss, a family of sparse softmax alternatives.
☆416Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for entmax
- Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention☆253Updated 3 years ago
- Implementation of Sparsemax activation in Pytorch☆156Updated 4 years ago
- Understanding the Difficulty of Training Transformers☆328Updated 2 years ago
- Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…☆250Updated 3 years ago
- Fully featured implementation of Routing Transformer☆284Updated 3 years ago
- Code for Multi-Head Attention: Collaborate Instead of Concatenate