aced125 / sparsemax
A PyTorch Implementation of the Sparsemax operator (https://arxiv.org/pdf/1803.09820.pdf)
☆31Updated 2 years ago
Alternatives and similar repositories for sparsemax:
Users that are interested in sparsemax are comparing it to the libraries listed below
- Implementation of Flow++ in PyTorch☆41Updated 5 years ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Implementation of Sparsemax activation in Pytorch☆158Updated 4 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆97Updated last year
- Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation☆69Updated 4 years ago
- ☆72Updated 3 years ago
- ☆49Updated 4 years ago
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 4 years ago
- Sequence Modeling with Structured State Spaces☆62Updated 2 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆102Updated 3 years ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 2 years ago
- Discrete Normalizing Flows implemented in PyTorch☆109Updated 3 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆62Updated 2 years ago
- Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/21…☆119Updated 2 years ago
- Pytorch implementation of the Power Spherical distribution☆74Updated 7 months ago
- ☆163Updated 2 years ago
- ☆147Updated 2 years ago
- Pytorch Implementation of OpenAI's "Improved Variational Inference with Inverse Autoregressive Flow"☆80Updated 4 years ago
- Stochastic Normalizing Flows☆75Updated 3 years ago
- ☆20Updated 2 years ago
- Code for the paper PermuteFormer☆42Updated 3 years ago
- ☆67Updated 2 years ago
- Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities☆209Updated 9 months ago
- ☆108Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆122Updated last year
- Fast Discounted Cumulative Sums in PyTorch☆95Updated 3 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 4 years ago
- Last-layer Laplace approximation code examples☆83Updated 3 years ago
- Official PyTorch BIVA implementation (BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling)☆83Updated 2 years ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆67Updated 2 years ago