thu-ml / SageAttentionView on GitHub
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
3,249Jan 17, 2026Updated 2 months ago

Alternatives and similar repositories for SageAttention

Users that are interested in SageAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?