Implementation of fused cosine similarity attention in the same style as Flash Attention
☆220Feb 13, 2023Updated 3 years ago
Alternatives and similar repositories for flash-cosine-sim-attention
Users that are interested in flash-cosine-sim-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Flash Attention in Jax☆227Mar 1, 2024Updated 2 years ago
- ☆30Oct 3, 2022Updated 3 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Implementation of a U-net complete with efficient attention as well as the latest research findings☆292May 3, 2024Updated last year
- Implementation of a Transformer, but completely in Triton☆279Apr 5, 2022Updated 3 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- ☆18Oct 3, 2022Updated 3 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 5 months ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30May 31, 2022Updated 3 years ago
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- Graph neural network message passing reframed as a Transformer with local attention☆70Dec 24, 2022Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆159Sep 15, 2023Updated 2 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- Named tensors with first-class dimensions for PyTorch☆332Jun 14, 2023Updated 2 years ago
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated last month
- ☆82Dec 1, 2023Updated 2 years ago
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated last year
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,078Apr 17, 2024Updated last year
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆32Jul 28, 2023Updated 2 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Pytorch library for fast transformer implementations☆1,765Mar 23, 2023Updated 3 years ago
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,183Nov 27, 2024Updated last year
- Butterfly matrix multiplication in PyTorch☆179Oct 5, 2023Updated 2 years ago
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆67Jan 10, 2023Updated 3 years ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- Aggregating embeddings over time☆32Jan 19, 2023Updated 3 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Jan 16, 2022Updated 4 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Directed masked autoencoders☆14Mar 17, 2026Updated last week
- FFCV: Fast Forward Computer Vision (and other ML workloads!)☆2,986Jun 16, 2024Updated last year
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- Implementation of the convolutional module from the Conformer paper, for use in Transformers☆433May 17, 2023Updated 2 years ago