ModelCloud / GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
☆375Updated this week
Alternatives and similar repositories for GPTQModel:
Users that are interested in GPTQModel are comparing it to the libraries listed below
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,103Updated this week
- Advanced Quantization Algorithm for LLMs/VLMs.☆394Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆771Updated 6 months ago
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.☆1,057Updated this week
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆259Updated 5 months ago
- A throughput-oriented high-performance serving framework for LLMs