mengqin / SageAttentionView on GitHub
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
47Jan 19, 2026Updated last month

Alternatives and similar repositories for SageAttention

Users that are interested in SageAttention are comparing it to the libraries listed below

Sorting:

Are these results useful?