hazan-lab / flash-stuLinks

PyTorch implementation of the Flash Spectral Transform Unit.

☆17

Alternatives and similar repositories for flash-stu

Users that are interested in flash-stu are comparing it to the libraries listed below

Sorting:

GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 9 months ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆32Updated 2 months ago
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆24Updated 3 weeks ago
emalach / LinearLM
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆19Updated 11 months ago
dame-cell / Triformer
Transformers components but in Triton
☆34Updated 2 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆60Updated last year
belindal / state-tracking
Code and data for paper "(How) do Language Models Track State?"
☆14Updated 3 months ago
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
Dao-AILab / gemm-cublas
☆21Updated 2 months ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆68Updated 4 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
TiledTensor / TiledBench
Benchmark tests supporting the TiledCUDA library.
☆16Updated 7 months ago
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆92Updated last month
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆44Updated 9 months ago
BBuf / flash-rwkv
☆31Updated last year
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆34Updated this week
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆85Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆70Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆83Updated 3 weeks ago
AndPotap / einsum-search
☆32Updated 9 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆142Updated last month
UmerHA / triton_util
Make triton easier
☆47Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆27Updated 4 months ago
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆64Updated 2 weeks ago