thu-ml / SageAttentionLinks

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
1,766Updated last week

Alternatives and similar repositories for SageAttention

Users that are interested in SageAttention are comparing it to the libraries listed below

Sorting: