Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
29Updated 3 months ago

Alternatives and similar repositories for decoding_attention:

Users that are interested in decoding_attention are comparing it to the libraries listed below