Bruce-Lee-LY / decoding_attentionView on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
46Jun 11, 2025Updated 10 months ago

Alternatives and similar repositories for decoding_attention

Users that are interested in decoding_attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?