casper-hansen / AutoAWQLinks
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,206Updated 2 months ago
Alternatives and similar repositories for AutoAWQ
Users that are interested in AutoAWQ are comparing it to the libraries listed below
Sorting:
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,140Updated last week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,143Updated last year
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.