Yifei-Zuo/FlashLLA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Yifei-Zuo/FlashLLA)

Yifei-Zuo / FlashLLA

Official repository Flash Local Linear Attention

☆37

Alternatives and similar repositories for FlashLLA

Users that are interested in FlashLLA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆65Jul 7, 2026Updated last week
catswe / flash-attention-residuals
View on GitHub
Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)
☆85May 29, 2026Updated last month
0xWelt / VibeRL
View on GitHub
VibeRL is a Reinforcement Learning framework built essentially through vibe coding with Kimi K2.
☆17Jul 13, 2026Updated last week
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
richardodliu / OpenCodeEval
View on GitHub
☆52Mar 9, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆28Sep 4, 2025Updated 10 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
Dao-AILab / gemm-cublas
View on GitHub
☆22May 5, 2025Updated last year
jxiw / M1
View on GitHub
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆48Jul 17, 2025Updated last year
open-lm-engine / accelerated-model-architectures
View on GitHub
A bunch of kernels that might make stuff slower 😉
☆91Updated this week
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆32Feb 25, 2025Updated last year
xlite-dev / qwen-image-fast
View on GitHub
⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
☆17Oct 24, 2025Updated 8 months ago
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
goombalab / raven
View on GitHub
☆77May 29, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
AbanteAI / LoCoDiff-bench
View on GitHub
☆33Oct 15, 2025Updated 9 months ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆131May 20, 2026Updated 2 months ago
fla-org / hybrid-distillation
View on GitHub
☆34Dec 31, 2025Updated 6 months ago
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
flash-bon / flash-bon
View on GitHub
(ECCV 2026): Official code for Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models
☆17Jul 9, 2026Updated last week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
mit-han-lab / flash-moba
View on GitHub
☆250Nov 19, 2025Updated 8 months ago
xlite-dev / ffpa-attn
View on GitHub
🤖FFPA: Extends FA-2/3 via Split-D for large headdims, 1.5x~6×↑🎉 vs SDPA, up to 513~535 TFLOPS🎉 on NVIDIA H200.
☆315Updated this week
zhehangdu / Newton-Muon
View on GitHub
The Newton-Muon optimizer
☆30Jun 5, 2026Updated last month
hazan-lab / flash-stu
View on GitHub
PyTorch implementation of the Flash Spectral Transform Unit.
☆22Sep 19, 2024Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
luliyucoordinate / cute-flash-attention
View on GitHub
Implement Flash Attention using Cute.
☆108Dec 17, 2024Updated last year
xxyux / SpInfer
View on GitHub
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆68Mar 25, 2025Updated last year
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
AndreSlavescu / mHC.cu
View on GitHub
mHC kernels implemented in CUDA
☆264Mar 9, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
tilde-research / comp-muon-release
View on GitHub
Compositional Muon release
☆23Jun 5, 2026Updated last month
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
belindal / state-tracking
View on GitHub
Code and data for paper "(How) do Language Models Track State?"
☆26Mar 31, 2025Updated last year
OpenSparseLLMs / Linear-MoE
View on GitHub
☆139Jun 6, 2025Updated last year
zbh2047 / clipping-algorithms
View on GitHub
Code for the NeurIPS 2020 paper "Improved analysis of clippind algorithms for non-convex optimization", including various clipping algori…
☆10Feb 17, 2021Updated 5 years ago
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
da03 / criticize_text_generation
View on GitHub
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆12Mar 18, 2023Updated 3 years ago