proger / nanokitchenLinks

Parallel Associative Scan for Language Models

☆17

Alternatives and similar repositories for nanokitchen

Users that are interested in nanokitchen are comparing it to the libraries listed below

Sorting:

srush / mamba-scans
Blog post
☆17Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
johnryan465 / pscan
☆39Updated last year
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆17Updated 7 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
AndPotap / einsum-search
☆33Updated last year
sustcsonglin / gated_linear_attention_layer
☆31Updated last year
google-deepmind / spectral_ssm
☆34Updated last year
berlino / seq_icl
☆53Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆36Updated 11 months ago
eamartin / parallelizing_linear_rnns
☆44Updated 7 years ago
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
emalach / LinearLM
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Updated last year
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
shikaiqiu / compute-better-spent
☆58Updated last year
yikangshen / megablocks
☆20Updated last year
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆36Updated 6 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
lxxue / prefix_sum
A PyTorch wrapper of parallel exclusive scan in CUDA
☆12Updated 2 years ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
test-time-training / ttt-tk
☆41Updated last week
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
sjelassi / transformers_ssm_copy
☆33Updated last year
irhum / hyena
JAX/Flax implementation of the Hyena Hierarchy
☆34Updated 2 years ago
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 4 months ago