aced125 / sparsemax
A PyTorch Implementation of the Sparsemax operator (https://arxiv.org/pdf/1803.09820.pdf)
☆31Updated 2 years ago
Alternatives and similar repositories for sparsemax:
Users that are interested in sparsemax are comparing it to the libraries listed below
- Sequence Modeling with Structured State Spaces☆61Updated 2 years ago
- Transformers with doubly stochastic attention☆44Updated 2 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆97Updated last year
- Implementation of Sparsemax activation in Pytorch☆158Updated 4 years ago
- Fast Discounted Cumulative Sums in PyTorch☆95Updated 3 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆61Updated 2 years ago
- ☆67Updated 2 years ago
- Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation☆68Updated 4 years ago
- ☆49Updated 4 years ago
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 4 years ago
- Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/21…☆119Updated 2 years ago
- Multiplicative Normalizing Flows in PyTorch.☆23Updated 2 weeks ago
- Discrete Normalizing Flows implemented in PyTorch☆108Updated 3 years ago
- [EMNLP'19] Summary for Transformer Understanding☆53Updated 5 years ago
- Pytorch implementation of the Power Spherical distribution☆74Updated 6 months ago
- ☆72Updated 3 years ago
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆21Updated 2 months ago
- Adaptive Gradient Clipping☆124Updated 2 years ago
- CUDA kernels for generalized matrix-multiplication in PyTorch☆79Updated 3 years ago
- Code for the article "What if Neural Networks had SVDs?", to be presented as a spotlight paper at NeurIPS 2020.☆72Updated 5 months ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆14Updated 2 months ago
- Implementation of Flow++ in PyTorch☆41Updated 5 years ago
- Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities☆209Updated 8 months ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- a lightweight transformer library for PyTorch☆72Updated 3 years ago
- This library would form a permanent home for reusable components for deep probabilistic programming. The library would form and harness a…☆302Updated last month
- Neural Spline Flow, RealNVP, Autoregressive Flow, 1x1Conv in PyTorch.☆273Updated last year
- A Pytorch implementation of the optimal transport kernel embedding☆114Updated 3 years ago
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆124Updated last year
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 2 years ago