lucidrains / flash-cosine-sim-attentionLinks

Implementation of fused cosine similarity attention in the same style as Flash Attention

☆219

Alternatives and similar repositories for flash-cosine-sim-attention

Users that are interested in flash-cosine-sim-attention are comparing it to the libraries listed below

Sorting:

lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Updated 2 years ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆384Updated 2 years ago
AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆184Updated 2 years ago
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆97Updated 3 years ago
lucidrains / Adan-pytorch
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆252Updated 3 years ago
HomebrewML / revlib
Simple and efficient RevNet-Library for PyTorch with XLA and DeepSpeed support and parameter offload
☆131Updated 3 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆46Updated 2 years ago
google-research / diffstride
TF/Keras code for DiffStride, a pooling layer with learnable strides.
☆124Updated 3 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆223Updated last year
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆223Updated last year
jiaweizzhao / ZerO-initialization
☆75Updated 3 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
lucidrains / panoptic-transformer
Another attempt at a long-context / efficient transformer by me
☆38Updated 3 years ago
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆125Updated last year
facebookresearch / diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …
☆237Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
lucidrains / nystrom-attention
Implementation of Nyström Self-attention, from the paper Nyströmformer
☆141Updated 8 months ago
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
ctlllll / SGConv
☆164Updated 2 years ago
facebookresearch / dropout
Code release for "Dropout Reduces Underfitting"
☆317Updated 2 years ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆134Updated last month
rjbruin / flexconv
Code repository for the ICLR 2022 paper "FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes" https://openreview.ne…
☆116Updated 3 years ago
lucidrains / einops-exts
Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️
☆57Updated 2 years ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆81Updated 4 years ago
ppwwyyxx / RAM-multiprocess-dataloader
Demystify RAM Usage in Multi-Process Data Loaders
☆205Updated 2 years ago
patil-suraj / vit-vqgan
JAX implementation ViT-VQGAN
☆82Updated 3 years ago