microsoft / RetrievalAttentionView on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
135Feb 22, 2026Updated 2 months ago

Alternatives and similar repositories for RetrievalAttention

Users that are interested in RetrievalAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?