ModelCloud / GPTQModelLinks

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

☆913

Alternatives and similar repositories for GPTQModel

Users that are interested in GPTQModel are comparing it to the libraries listed below

Sorting:

vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆2,296Updated this week
intel / auto-round
Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …
☆735Updated this week
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆958Updated last year
microsoft / MInference
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,163Updated 2 months ago
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,284Updated 6 months ago
mit-han-lab / omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆786Updated 9 months ago
efeslab / Nanoflow
A throughput-oriented high-performance serving framework for LLMs
☆918Updated last month
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆668Updated 7 months ago
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆894Updated last month
ModelTC / LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆632Updated 2 weeks ago
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆910Updated last week
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆875Updated last week
arcee-ai / DistillKit
An Open Source Toolkit For LLM Distillation
☆785Updated 4 months ago
microsoft / T-MAC
Low-bit LLM inference on CPU/NPU with lookup table
☆898Updated 6 months ago
mlc-ai / xgrammar
Fast, Flexible and Portable Structured Generation
☆1,396Updated last week
SafeAILab / EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,035Updated last week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆523Updated this week
Cornell-RelaxML / quip-sharp
☆570Updated last year
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆357Updated last month
allenai / OLMoE
OLMoE: Open Mixture-of-Experts Language Models
☆919Updated 2 months ago
mit-han-lab / smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,568Updated last year
facebookresearch / SpinQuant
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆352Updated 9 months ago
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,307Updated 9 months ago
NVIDIA / TensorRT-Model-Optimizer
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresse…
☆1,605Updated this week
ray-project / llmperf
LLMPerf is a library for validating and benchmarking LLMs
☆1,057Updated 11 months ago
neuralmagic / AutoFP8
☆205Updated 7 months ago
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆316Updated last week
LeanModels / DFloat11
DFloat11: Lossless LLM Compression for Efficient GPU Inference
☆569Updated last week
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆658Updated 2 weeks ago
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆730Updated this week