mit-han-lab / omniserveLinks

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
801Updated 10 months ago

Alternatives and similar repositories for omniserve

Users that are interested in omniserve are comparing it to the libraries listed below

Sorting: