4 bits quantization of LLaMA using GPTQ
☆3,072Jul 13, 2024Updated last year
Alternatives and similar repositories for GPTQ-for-LLaMa
Users that are interested in GPTQ-for-LLaMa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,292Mar 27, 2024Updated 2 years ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,051Apr 11, 2025Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,915Sep 30, 2023Updated 2 years ago
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- ☆536Dec 1, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Instruct-tune LLaMA on consumer hardware☆18,945Jul 29, 2024Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,870Jun 10, 2024Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,503Jul 17, 2025Updated 9 months ago
- Accessible large language models via k-bit quantization for PyTorch.☆8,121Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.☆39,448Jun 2, 2025Updated 10 months ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆717Aug 13, 2024Updated last year
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation: