ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
353Updated this week

Alternatives and similar repositories for GPTQModel:

Users that are interested in GPTQModel are comparing it to the libraries listed below