rayleizhu / vllm-ra

[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
38Updated 11 months ago

Alternatives and similar repositories for vllm-ra:

Users that are interested in vllm-ra are comparing it to the libraries listed below