Yifei-Zuo / Flash-LLALinks
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
β23Updated 3 weeks ago
Alternatives and similar repositories for Flash-LLA
Users that are interested in Flash-LLA are comparing it to the libraries listed below
Sorting:
- β130Updated 4 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β26Updated 8 months ago
- Quantized Attention on GPUβ44Updated 11 months ago
- β50Updated 5 months ago
- Fast and memory-efficient exact kmeansβ111Updated 3 weeks ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ119Updated 4 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of codeβ46Updated 3 months ago
- Xmixers: A collection of SOTA efficient token/channel mixersβ29Updated last month
- β120Updated 2 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ195Updated 4 months ago
- β42Updated this week
- Transformers components but in Tritonβ34Updated 5 months ago
- β101Updated 5 months ago
- Triton implement of bi-directional (non-causal) linear attentionβ56Updated 8 months ago
- β93Updated 8 months ago
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"β55Updated 3 months ago
- β18Updated 10 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ58Updated 3 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ147Updated last week
- β21Updated 7 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inferenceβ54Updated 11 months ago
- The evaluation framework for training-free sparse attention in LLMsβ101Updated last week
- β39Updated 2 months ago
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMsβ94Updated 11 months ago
- β82Updated 9 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ239Updated 3 months ago
- β61Updated 3 months ago
- Flash-Linear-Attention models beyond languageβ19Updated last month
- β145Updated 8 months ago
- β65Updated 6 months ago