Yifei-Zuo / Flash-LLAView external linksLinks
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
☆23Oct 1, 2025Updated 4 months ago
Alternatives and similar repositories for Flash-LLA
Users that are interested in Flash-LLA are comparing it to the libraries listed below
Sorting:
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆26Jan 22, 2026Updated 3 weeks ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- ☆38Aug 7, 2025Updated 6 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 5 months ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆37Feb 6, 2026Updated last week
- ☆52May 19, 2025Updated 8 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 6 months ago
- [WIP] Better (FP8) attention for Hopper☆32Feb 24, 2025Updated 11 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.