microsoft / RetrievalAttentionView on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
133Feb 22, 2026Updated last month

Alternatives and similar repositories for RetrievalAttention

Users that are interested in RetrievalAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?