Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
36Updated last month

Alternatives and similar repositories for decoding_attention:

Users that are interested in decoding_attention are comparing it to the libraries listed below