mit-han-lab / omniserveLinks

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
686Updated 2 months ago

Alternatives and similar repositories for omniserve

Users that are interested in omniserve are comparing it to the libraries listed below

Sorting: