Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆269Aug 10, 2021Updated 4 years ago
Alternatives and similar repositories for sinkhorn-transformer
Users that are interested in sinkhorn-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- My take on a practical implementation of Linformer for Pytorch.☆423Jul 27, 2022Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- Pytorch library for fast transformer implementations☆1,765Mar 23, 2023Updated 3 years ago
- High performance pytorch modules☆17Jan 14, 2023Updated 3 years ago
- Cascaded Text Generation with Markov Transformers☆130Mar 20, 2023Updated 3 years ago
- The entmax mapping and its loss, a family of sparse softmax alternatives.☆465Jun 22, 2024Updated last year
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated last year
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- Longformer: The Long-Document Transformer☆2,189Feb 8, 2023Updated 3 years ago
- ☆221Jun 8, 2020Updated 5 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 4 years ago
- Generalizing Natural Language Analysis through Span-relation Representations☆91Sep 22, 2025Updated 6 months ago
- Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"☆1,611Aug 12, 2020Updated 5 years ago
- ☆19Oct 26, 2022Updated 3 years ago
- a Pytorch implementation of the Reformer Network (https://openreview.net/pdf?id=rkgNKkHtvB)☆53Nov 22, 2022Updated 3 years ago
- Sparse and structured neural attention mechanisms☆225Aug 31, 2020Updated 5 years ago
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆610Jul 11, 2024Updated last year
- Neural Text Generation with Unlikelihood Training☆310Aug 31, 2021Updated 4 years ago
- Code for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution (ACL2021)☆13Jun 2, 2021Updated 4 years ago
- Code for paper by Bamler & Mandt, "Extreme Classification via Adversarial Softmax Approximation" (ICLR 2020)☆14Apr 8, 2020Updated 5 years ago
- ☆65Apr 8, 2020Updated 5 years ago
- ☆21Mar 15, 2023Updated 3 years ago
- Understanding the Difficulty of Training Transformers☆332May 31, 2022Updated 3 years ago
- KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows☆1,163Updated this week
- Source code for "On the Relationship between Self-Attention and Convolutional Layers"☆1,118Jan 10, 2023Updated 3 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,124Apr 20, 2022Updated 3 years ago
- LEARNING LATENT PERMUTATIONS WITH GUMBEL-SINKHORN NETWORKS IMPLEMENTATION WITH PYTORCH☆82Jul 6, 2023Updated 2 years ago
- Fast Differentiable Sorting and Ranking☆621Feb 15, 2024Updated 2 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆34Oct 30, 2020Updated 5 years ago
- Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021☆14Dec 11, 2021Updated 4 years ago
- Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022☆14Nov 7, 2022Updated 3 years ago
- Official Repository for "Modeling Hierarchical Structures with Continuous Recursive Neural Networks" (ICML 2021)☆11Aug 18, 2021Updated 4 years ago
- higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual tr…☆1,628Mar 25, 2022Updated 3 years ago