lucidrains / memory-efficient-attention-pytorchLinks

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

☆384

Alternatives and similar repositories for memory-efficient-attention-pytorch

Users that are interested in memory-efficient-attention-pytorch are comparing it to the libraries listed below

Sorting:

lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆218Updated 2 years ago
AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆184Updated 2 years ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆488Updated 4 months ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆302Updated last year
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆780Updated 4 months ago
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆222Updated last year
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
bobby-he / simplified_transformers
☆293Updated 11 months ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Updated 2 years ago
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆373Updated last year
lucidrains / Adan-pytorch
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆252Updated 3 years ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆224Updated last year
lucidrains / ema-pytorch
A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model
☆624Updated last year
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
lucidrains / linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
☆815Updated last year
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆222Updated last year
srush / annotated-mamba
Annotated version of the Mamba paper
☆491Updated last year
facebookresearch / dropout
Code release for "Dropout Reduces Underfitting"
☆315Updated 2 years ago
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆489Updated 2 years ago
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆125Updated last year
lucidrains / soft-moe-pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆336Updated 8 months ago
lucidrains / recurrent-interface-network-pytorch
Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…
☆207Updated last year
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆334Updated 11 months ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago
lucidrains / nystrom-attention
Implementation of Nyström Self-attention, from the paper Nyströmformer
☆141Updated 8 months ago
lucidrains / FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
☆371Updated 2 years ago
meta-pytorch / torcheval
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…
☆244Updated 2 months ago
kozistr / pytorch_optimizer
optimizer & lr scheduler & loss function collections in PyTorch
☆374Updated last week
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆546Updated 2 years ago