zhihu / TLLM_QMMView on GitHub
TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
16Jul 5, 2024Updated last year

Alternatives and similar repositories for TLLM_QMM

Users that are interested in TLLM_QMM are comparing it to the libraries listed below

Sorting:

Are these results useful?