rayleizhu / vllm-ra

[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
34Updated 8 months ago

Related projects

Alternatives and complementary repositories for vllm-ra