mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
512Updated this week

Alternatives and similar repositories for omniserve:

Users that are interested in omniserve are comparing it to the libraries listed below