mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
348Updated last week

Related projects

Alternatives and complementary repositories for duo-attention