Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆389Jul 18, 2023Updated 2 years ago
Alternatives and similar repositories for memory-efficient-attention-pytorch
Users that are interested in memory-efficient-attention-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆32Jun 19, 2022Updated 3 years ago
- Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch☆184Jan 6, 2023Updated 3 years ago
- Another attempt at a long-context / efficient transformer by me☆38Apr 11, 2022Updated 3 years ago
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆641Jul 17, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆145Mar 24, 2025Updated last year
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Implementation of a U-net complete with efficient attention as well as the latest research findings☆292May 3, 2024Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated last month
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- Implementation of a Transformer, but completely in Triton☆279Apr 5, 2022Updated 3 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆231Sep 6, 2024Updated last year
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,183Nov 27, 2024Updated last year
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆166Feb 12, 2024Updated 2 years ago
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- Vector (and Scalar) Quantization, in Pytorch☆3,872Feb 12, 2026Updated last month
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆379Jun 17, 2024Updated last year
- Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch☆253Sep 1, 2022Updated 3 years ago
- Implementation of Uformer, Attention-based Unet, in Pytorch☆96Oct 26, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Fast and memory-efficient exact attention☆22,938Updated this week
- Implementation of the algorithm detailed in paper "Evolutionary design of molecules based on deep learning and a genetic algorithm"☆24Dec 15, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆10,388Mar 18, 2026Updated last week
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways☆828Nov 9, 2022Updated 3 years ago
- GAU-alpha-pytorch☆20May 11, 2022Updated 3 years ago
- Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch☆537Dec 8, 2023Updated 2 years ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆341Feb 23, 2025Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Oct 22, 2023Updated 2 years ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- FFCV: Fast Forward Computer Vision (and other ML workloads!)☆2,986Jun 16, 2024Updated last year
- Unofficial implementation of Face0 with SDXL☆12Sep 1, 2023Updated 2 years ago
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,078Apr 17, 2024Updated last year
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Jan 21, 2025Updated last year
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,438Feb 20, 2026Updated last month