mit-han-lab / omniserve
View external linksLinks

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
812Mar 6, 2025Updated 11 months ago

Alternatives and similar repositories for omniserve

Users that are interested in omniserve are comparing it to the libraries listed below

Sorting:

Are these results useful?