sgl-project/sgl-flash-attn

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sgl-project/sgl-flash-attn)

sgl-project / sgl-flash-attn

Fast and memory-efficient exact attention

☆18

Alternatives and similar repositories for sgl-flash-attn

Users that are interested in sgl-flash-attn are comparing it to the libraries listed below

Sorting:

yzlnew / infra-skills
View on GitHub
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…
☆88Feb 2, 2026Updated last month
li-plus / flash-preference
View on GitHub
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆51Jul 4, 2025Updated 8 months ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆38Aug 7, 2025Updated 7 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆74May 5, 2025Updated 10 months ago
MingXiangL / Teacache-xDiT
View on GitHub
Combining Teacache with xDiT to Accelerate Visual Generation Models
☆32Apr 21, 2025Updated 10 months ago
NoakLiu / FastCache-xDiT
View on GitHub
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
☆46Feb 17, 2026Updated 2 weeks ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆42Dec 29, 2025Updated 2 months ago
NickL77 / BaldEagle
View on GitHub
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
☆83Jul 3, 2025Updated 8 months ago
ByteDance-Seed / FlexPrefill
View on GitHub
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆161Oct 13, 2025Updated 4 months ago
svg-project / flash-kmeans
View on GitHub
Fast and memory-efficient exact kmeans
☆140Feb 18, 2026Updated 2 weeks ago
tilde-research / nsa-impl
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆129Jun 24, 2025Updated 8 months ago
microsoft / RetrievalAttention
View on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
☆127Feb 22, 2026Updated 2 weeks ago
sgl-project / genai-bench
View on GitHub
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆275Updated this week
Yifei-Zuo / Flash-LLA
View on GitHub
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…
☆23Oct 1, 2025Updated 5 months ago
SuDIS-ZJU / nlcTables
View on GitHub
☆15Jan 27, 2026Updated last month
deathwings602 / Unified-IR
View on GitHub
面向多平台编译优化的深度学习中间表示
☆10Oct 28, 2024Updated last year
AdaptInfer / context-review
View on GitHub
☆14Jan 23, 2026Updated last month
ChuanyangZheng / L2ViT
View on GitHub
Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer
☆16Sep 7, 2024Updated last year
LGAI-Research / SetR
View on GitHub
☆20Sep 11, 2025Updated 5 months ago
oliverhu / rama
View on GitHub
llama2 inference engine in Rust
☆13Apr 12, 2024Updated last year
saubury / GenPiCam
View on GitHub
GenPiCam - a RaspberryPi based camera that reimagines the world with GenAI.
☆10Jun 28, 2023Updated 2 years ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆53Aug 6, 2025Updated 7 months ago
Elluran / concentration_notebooks
View on GitHub
☆11Dec 11, 2024Updated last year
zyxxmu / cam
View on GitHub
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆47Jun 19, 2024Updated last year
sangminwoo / awesome-token-redundancy-reduction
View on GitHub
😎 Awesome papers on token redundancy reduction
☆11Mar 12, 2025Updated 11 months ago
MoFHeka / execution-ucx
View on GitHub
A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.
☆29Feb 22, 2026Updated 2 weeks ago
yeahdongcn / WFH
View on GitHub
能够远程办公（work from home）的公司名单
☆16Mar 2, 2022Updated 4 years ago
yuxiang-gao / awesome-llm-blogs
View on GitHub
Blogs that I'm actively following.
☆13Sep 17, 2023Updated 2 years ago
130B848 / ipads-tutorial07
View on GitHub
☆10Dec 8, 2021Updated 4 years ago
shyhoom / T22_034_CRDDC_2022_SourceCode
View on GitHub
T22_034_han_shi_hao_CRDDC_2022_SourceCode
☆11Dec 29, 2023Updated 2 years ago
bethelmelesse / UnifiedCrawl
View on GitHub
☆16Nov 26, 2024Updated last year
homeport / yft
View on GitHub
/j f t/ - YAML file tool
☆13Feb 9, 2026Updated 3 weeks ago
hatsu3 / curator
View on GitHub
☆11Jan 17, 2024Updated 2 years ago
meeuw / unattended-windows-10
View on GitHub
Packer templates to install Windows 10 Evaluation using the qemu/kvm builder.
☆12Sep 2, 2021Updated 4 years ago
Wucy0519 / VTinker
View on GitHub
This is the official implementation of our paper: “VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame In…
☆16Dec 5, 2025Updated 3 months ago
NetManAIOps / DOMI_code
View on GitHub
code for DOMI
☆11Mar 24, 2023Updated 2 years ago
Halifuda / Xerxes
View on GitHub
A standalone CXL-enabled system simulator.
☆19Jan 10, 2026Updated last month
GeeeekExplorer / 3d-parallel-demo
View on GitHub
使用torch.distributed实现DP/TP/PP
☆13Dec 28, 2023Updated 2 years ago
AniZpZ / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆11Dec 13, 2023Updated 2 years ago