aced125 / sparsemax
A PyTorch Implementation of the Sparsemax operator (https://arxiv.org/pdf/1803.09820.pdf)
☆32Updated 2 years ago
Alternatives and similar repositories for sparsemax
Users that are interested in sparsemax are comparing it to the libraries listed below
Sorting:
- Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation☆69Updated 4 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆99Updated 2 years ago
- ☆109Updated 2 years ago
- Implementation of Flow++ in PyTorch☆41Updated 5 years ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆105Updated 3 years ago
- Pytorch implementation of the Power Spherical distribution☆74Updated 9 months ago
- ☆164Updated 2 years ago
- Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/21…☆121Updated 2 years ago
- [EMNLP'19] Summary for Transformer Understanding☆53Updated 5 years ago
- Implementation of Sparsemax activation in Pytorch☆160Updated 4 years ago
- Adaptive Gradient Clipping☆131Updated 2 years ago
- Discrete Normalizing Flows implemented in PyTorch☆112Updated 3 years ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 3 years ago
- Reparameterize your PyTorch modules☆71Updated 4 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 3 years ago
- ☆68Updated 2 years ago
- Sequence Modeling with Structured State Spaces☆63Updated 2 years ago
- ☆148Updated 3 years ago
- A curated list of techniques to avoid posterior collapse☆87Updated 2 years ago
- Code for the paper PermuteFormer☆42Updated 3 years ago
- ☆47Updated 2 years ago
- Pytorch Implementation of OpenAI's "Improved Variational Inference with Inverse Autoregressive Flow"☆80Updated 5 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆63Updated 3 years ago
- Official PyTorch BIVA implementation (BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling)☆84Updated 2 years ago
- Stochastic Normalizing Flows☆76Updated 3 years ago
- Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch☆40Updated 2 years ago
- An implementation of Variational Autoencoders with a constant balance between reconstruction error and Kullback-leibler divergence☆19Updated 4 years ago
- Differentiable Sorting Networks☆114Updated last year
- ☆49Updated 4 years ago