mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
373Updated 2 weeks ago

Related projects

Alternatives and complementary repositories for duo-attention