microsoft / RetrievalAttentionLinks

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
46Updated last week

Alternatives and similar repositories for RetrievalAttention

Users that are interested in RetrievalAttention are comparing it to the libraries listed below

Sorting: