Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
35Updated last week

Alternatives and similar repositories for decoding_attention:

Users that are interested in decoding_attention are comparing it to the libraries listed below