svg-project / flash-kmeans
View external linksLinks

Fast and memory-efficient exact kmeans

☆138

Alternatives and similar repositories for flash-kmeans

Users that are interested in flash-kmeans are comparing it to the libraries listed below

Sorting:

KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆35Jul 29, 2025Updated 6 months ago
Mddct / transformer-vocos
View on GitHub
☆36Sep 6, 2025Updated 5 months ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆28Sep 4, 2025Updated 5 months ago
sgl-project / sgl-flash-attn
View on GitHub
Fast and memory-efficient exact attention
☆18Jan 23, 2026Updated 3 weeks ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated 8 months ago
svg-project / Sparse-VideoGen
View on GitHub
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
☆627Feb 3, 2026Updated last week
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆18Nov 19, 2024Updated last year
yaof20 / Flash-RL
View on GitHub
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆290Nov 7, 2025Updated 3 months ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated 9 months ago
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆14Aug 2, 2024Updated last year
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆44Nov 22, 2024Updated last year
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆74May 5, 2025Updated 9 months ago
mit-han-lab / flash-moba
View on GitHub
☆221Nov 19, 2025Updated 2 months ago
microsoft / AttentionEngine
View on GitHub
☆118May 19, 2025Updated 8 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,350Updated this week
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆32Feb 24, 2025Updated 11 months ago
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆235Jun 15, 2025Updated 8 months ago
renll / SeqBoat
View on GitHub
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Dec 2, 2023Updated 2 years ago
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated 9 months ago
tile-ai / TileOPs
View on GitHub
☆86Updated this week
zhuzilin / flash-attention-with-sink
View on GitHub
☆38Aug 7, 2025Updated 6 months ago
GeeeekExplorer / transformers-patch
View on GitHub
patches for huggingface transformers to save memory
☆34Jun 2, 2025Updated 8 months ago
tilde-research / nsa-impl
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆129Jun 24, 2025Updated 7 months ago
ByteDance-Seed / cudaLLM
View on GitHub
☆130Aug 18, 2025Updated 5 months ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆583Feb 6, 2026Updated last week
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆372Jul 10, 2025Updated 7 months ago
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆37Feb 6, 2026Updated last week
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆639Updated this week
Tencent-Hunyuan / flex-block-attn
View on GitHub
flex-block-attn: an efficient block sparse attention computation library
☆108Dec 26, 2025Updated last month
li-plus / flash-preference
View on GitHub
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆51Jul 4, 2025Updated 7 months ago
Yifei-Zuo / Flash-LLA
View on GitHub
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…
☆23Oct 1, 2025Updated 4 months ago
tsinghua-ideal / Twilight
View on GitHub
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆87Nov 29, 2025Updated 2 months ago
PiotrNawrot / nano-sparse-attention
View on GitHub
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆92Jul 17, 2025Updated 6 months ago
thu-ml / SpargeAttn
View on GitHub
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
☆939Dec 31, 2025Updated last month
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Jun 6, 2024Updated last year
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆260Nov 18, 2024Updated last year
yu-yake2002 / ysyx-docker
View on GitHub
A docker image for One Student One Chip's debug exam
☆10Sep 22, 2023Updated 2 years ago
uwsampl / paper-agents
View on GitHub
☆13Dec 9, 2024Updated last year
sherwinbahmani / threed_front_rendering
View on GitHub
☆13Sep 2, 2023Updated 2 years ago

svg-project / flash-kmeansView external linksLinks

Alternatives and similar repositories for flash-kmeans

svg-project / flash-kmeans
View external linksLinks