lucidrains / sinkhorn-transformerView external linksLinks
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆270Aug 10, 2021Updated 4 years ago
Alternatives and similar repositories for sinkhorn-transformer
Users that are interested in sinkhorn-transformer are comparing it to the libraries listed below
Sorting:
- My take on a practical implementation of Linformer for Pytorch.☆422Jul 27, 2022Updated 3 years ago
- Reformer, the efficient Transformer, in Pytorch☆2,193Jun 21, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Implementation of Linformer for Pytorch☆305Jan 5, 2024Updated 2 years ago
- Pytorch library for fast transformer implementations☆1,761Mar 23, 2023Updated 2 years ago
- The entmax mapping and its loss, a family of sparse softmax alternatives.☆459Jun 22, 2024Updated last year
- Transformer training code for sequential tasks☆610Sep 14, 2021Updated 4 years ago
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- Cascaded Text Generation with Markov Transformers☆130Mar 20, 2023Updated 2 years ago
- High performance pytorch modules☆18Jan 14, 2023Updated 3 years ago
- ☆221Jun 8, 2020Updated 5 years ago
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated 11 months ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- Longformer: The Long-Document Transformer☆2,186Feb 8, 2023Updated 3 years ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆47Jul 16, 2023Updated 2 years ago
- Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"☆1,608Aug 12, 2020Updated 5 years ago
- Generalizing Natural Language Analysis through Span-relation Representations☆91Sep 22, 2025Updated 4 months ago
- Code for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution (ACL2021)☆13Jun 2, 2021Updated 4 years ago
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆611Jul 11, 2024Updated last year
- Sparse and structured neural attention mechanisms☆225Aug 31, 2020Updated 5 years ago
- Understanding the Difficulty of Training Transformers☆332May 31, 2022Updated 3 years ago
- Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021☆14Dec 11, 2021Updated 4 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,123Apr 20, 2022Updated 3 years ago
- Source code for "On the Relationship between Self-Attention and Convolutional Layers"☆1,116Jan 10, 2023Updated 3 years ago
- KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows☆1,156Feb 6, 2026Updated last week
- ☆21Mar 15, 2023Updated 2 years ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆224Aug 20, 2024Updated last year
- Neural Text Generation with Unlikelihood Training☆310Aug 31, 2021Updated 4 years ago
- higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual tr…☆1,627Mar 25, 2022Updated 3 years ago
- Fast Differentiable Sorting and Ranking☆616Feb 15, 2024Updated 2 years ago
- Implementation of deep implicit attention in PyTorch☆65Aug 2, 2021Updated 4 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆34Oct 30, 2020Updated 5 years ago
- Tensorflow Implementation of "Theory and Experiments on Vector Quantized Autoencoders"☆15Feb 27, 2019Updated 6 years ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆16Nov 1, 2021Updated 4 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 4 years ago
- Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022☆14Nov 7, 2022Updated 3 years ago
- [ACL‘20] Highway Transformer: A Gated Transformer.☆33Dec 5, 2021Updated 4 years ago
- ☆153May 25, 2020Updated 5 years ago
- a Pytorch implementation of the Reformer Network (https://openreview.net/pdf?id=rkgNKkHtvB)☆53Nov 22, 2022Updated 3 years ago