glassroom / heinsen_attentionLinks

Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)

☆24

Alternatives and similar repositories for heinsen_attention

Users that are interested in heinsen_attention are comparing it to the libraries listed below

Sorting:

dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
Doraemonzzz / nanoTransNormer
☆11Updated 2 years ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
sustcsonglin / mamba-triton
☆50Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆17Updated last year
yikangshen / megablocks
☆20Updated last year
sjelassi / transformers_ssm_copy
☆35Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
ethansmith2000 / TransformerExperiments
☆19Updated 6 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
HazyResearch / prefix-linear-attention
☆57Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆38Updated last month
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated 3 months ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆46Updated last year
nikhilvyas / SOAP_MUON
Combining SOAP and MUON
☆17Updated 9 months ago
google-deepmind / spectral_ssm
☆34Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
test-time-training / ttt-tk
☆41Updated last month
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆29Updated 9 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆27Updated this week
johanwind / wind_rwkv
☆27Updated 4 months ago
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year