mit-han-lab / duo-attentionLinks

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
463Updated 3 months ago

Alternatives and similar repositories for duo-attention

Users that are interested in duo-attention are comparing it to the libraries listed below

Sorting: