mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
β˜†418Updated last week

Alternatives and similar repositories for duo-attention:

Users that are interested in duo-attention are comparing it to the libraries listed below