mdy666 / Scalable-Flash-Native-Sparse-AttentionLinks

☆46

Alternatives and similar repositories for Scalable-Flash-Native-Sparse-Attention

Users that are interested in Scalable-Flash-Native-Sparse-Attention are comparing it to the libraries listed below

Sorting:

Tencent-Hunyuan / flex-block-attn
flex-block-attn: an efficient block sparse attention computation library
☆65Updated this week
svg-project / flash-kmeans
Fast and memory-efficient exact kmeans
☆126Updated last week
Dao-AILab / grouped-latent-attention
☆132Updated 5 months ago
Yifei-Zuo / Flash-LLA
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…
☆23Updated last month
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆254Updated 4 months ago
mit-han-lab / flash-moba
☆143Updated last week
ByteDance-Seed / cudaLLM
☆121Updated 3 months ago
hao-ai-lab / Awesome-Video-Attention
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…
☆48Updated 3 weeks ago
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆56Updated 9 months ago
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆60Updated 4 months ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆249Updated 3 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆206Updated 5 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆248Updated 6 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆154Updated last month
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆308Updated last week
liangyuwang / Tiny-FSDP
Tiny-FSDP, a minimalistic re-implementation of the PyTorch FSDP
☆90Updated 3 months ago
FasterDecoding / TEAL
☆148Updated 9 months ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆150Updated last year
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆103Updated 5 months ago
yuezhouhu / 2by4-pretrain
Efficient 2:4 sparse training algorithms and implementations
☆57Updated 11 months ago
zhijie-group / Discrete-Diffusion-Forcing
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
☆202Updated 2 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆95Updated 9 months ago
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆50Updated last year
JieShibo / MoLE
[ICML 2025 Oral] Mixture of Lookup Experts
☆55Updated 6 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆98Updated 11 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆171Updated 2 months ago
osayamenja / FlashMoE
Distributed MoE in a Single Kernel [NeurIPS '25]
☆125Updated last month
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆36Updated last year
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆185Updated last week