ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
290Updated this week

Alternatives and similar repositories for GPTQModel:

Users that are interested in GPTQModel are comparing it to the libraries listed below