microsoft / RetrievalAttentionLinks

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
94Updated last month

Alternatives and similar repositories for RetrievalAttention

Users that are interested in RetrievalAttention are comparing it to the libraries listed below

Sorting: