ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
125Updated this week

Related projects

Alternatives and complementary repositories for GPTQModel