GindaChen / FlexFlashAttention3
View external linksLinks

FlexAttention w/ FlashAttention3 Support

☆27

Alternatives and similar repositories for FlexFlashAttention3

Users that are interested in FlexFlashAttention3 are comparing it to the libraries listed below

Sorting:

lianakoleva / no-libtorch-compile
View on GitHub
☆21Mar 3, 2025Updated 11 months ago
habanero-lab / APPy
View on GitHub
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆30Jan 28, 2026Updated 2 weeks ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆18Nov 19, 2024Updated last year
renll / SeqBoat
View on GitHub
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Dec 2, 2023Updated 2 years ago
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago
cchan / tccl
View on GitHub
extensible collectives library in triton
☆95Mar 31, 2025Updated 10 months ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated last year
EleutherAI / rnngineering
View on GitHub
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32May 25, 2024Updated last year
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 2 months ago
subho406 / agalite
View on GitHub
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)
☆23Oct 15, 2024Updated last year
Noahs-ARK / PaLM
View on GitHub
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Jan 7, 2020Updated 6 years ago
rycolab / aflt-f2023
View on GitHub
Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)
☆10Feb 21, 2023Updated 2 years ago
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated 9 months ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated last year
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆96Sep 19, 2025Updated 4 months ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
View on GitHub
☆22Dec 15, 2023Updated 2 years ago
TRI-ML / linear_open_lm
View on GitHub
A repository for research on medium sized language models.
☆77May 23, 2024Updated last year
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
softmax1 / Flash-Attention-Softmax-N
View on GitHub
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73May 26, 2024Updated last year
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Jun 6, 2024Updated last year
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated last year
AlirezaMorsali / MLP-Attention
View on GitHub
☆16Dec 19, 2024Updated last year
HazyResearch / train-tk
View on GitHub
train with kittens!
☆63Oct 25, 2024Updated last year
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated last year
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆44Nov 22, 2024Updated last year
FFTYYY / mhc-lite
View on GitHub
mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations
☆65Jan 12, 2026Updated last month
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Aug 14, 2024Updated last year
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆31Feb 25, 2025Updated 11 months ago
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆32Nov 29, 2024Updated last year
sjelassi / transformers_ssm_copy
View on GitHub
☆35Feb 26, 2024Updated last year
Doraemonzzz / Awesome-Triton-Resources
View on GitHub
Awesome Triton Resources
☆39Apr 27, 2025Updated 9 months ago
machelreid / editpro
View on GitHub
Learning to Model Editing Processes
☆26Aug 3, 2025Updated 6 months ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆56Aug 20, 2024Updated last year
hpcgarage / cuASR
View on GitHub
cuASR: CUDA Algebra for Semirings
☆44Aug 22, 2022Updated 3 years ago
berlino / seq_icl
View on GitHub
☆53May 20, 2024Updated last year
ptillet / triton-llvm-releases
View on GitHub
☆20Oct 11, 2023Updated 2 years ago
TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated last year

GindaChen / FlexFlashAttention3View external linksLinks

Alternatives and similar repositories for FlexFlashAttention3

GindaChen / FlexFlashAttention3
View external linksLinks