Bruce-Lee-LY / decoding_attentionView on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
46Jun 11, 2025Updated 8 months ago

Alternatives and similar repositories for decoding_attention

Users that are interested in decoding_attention are comparing it to the libraries listed below

Sorting:

Are these results useful?