lucidrains / autoregressive-linear-attention-cudaLinks

CUDA implementation of autoregressive linear attention, with all the latest research findings

☆44

Alternatives and similar repositories for autoregressive-linear-attention-cuda

Users that are interested in autoregressive-linear-attention-cuda are comparing it to the libraries listed below

Sorting:

lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated 11 months ago
lucidrains / quartic-transformer
Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)
☆52Updated 4 months ago
lucidrains / coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆46Updated 2 years ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆120Updated 9 months ago
ahennequ / pytorch-custom-mma
☆29Updated 2 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
lucidrains / kalman-filtering-attention
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆58Updated last year
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆91Updated 3 years ago
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 6 months ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆111Updated 6 months ago
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆56Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
lucidrains / product-key-memory
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆82Updated last year
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆41Updated 2 weeks ago
cloneofsimo / zeroshampoo
☆34Updated 10 months ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆59Updated this week
gregorbachmann / scaling_mlps
☆51Updated last year
lucidrains / holodeck-pytorch
Implementation of a holodeck, written in Pytorch
☆18Updated last year
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆85Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated 10 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆88Updated 2 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆214Updated 2 years ago
crowsonkb / torch-dist-utils
Utilities for PyTorch distributed
☆24Updated 5 months ago