rayleizhu / vllm-ra

[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
39Updated last year

Alternatives and similar repositories for vllm-ra:

Users that are interested in vllm-ra are comparing it to the libraries listed below