casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,021Updated 2 weeks ago
Alternatives and similar repositories for AutoAWQ:
Users that are interested in AutoAWQ are comparing it to the libraries listed below
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,861Updated this week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,063Updated 11 months ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,042Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,766Updated this week
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,466Updated 8 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters