dropbox/hqq

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dropbox/hqq)

dropbox / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

☆949

Alternatives and similar repositories for hqq

Users that are interested in hqq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆477Updated this week
Vahe1994 / AQLM
View on GitHub
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…
☆1,324Feb 26, 2026Updated 4 months ago
huggingface / optimum-quanto
View on GitHub
A pytorch quantization backend for optimum
☆1,046Updated this week
casper-hansen / AutoAWQ
View on GitHub
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,350May 11, 2025Updated last year
IST-DASLab / marlin
View on GitHub
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,109Sep 4, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆523Nov 26, 2024Updated last year
Cornell-RelaxML / QuIP
View on GitHub
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆399Feb 24, 2024Updated 2 years ago
SqueezeAILab / SqueezeLLM
View on GitHub
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆722Aug 13, 2024Updated last year
Cornell-RelaxML / quip-sharp
View on GitHub
☆600Oct 29, 2024Updated last year
pytorch / ao
View on GitHub
PyTorch native quantization and sparsity for training and inference
☆2,909Updated this week
AnswerDotAI / fsdp_qlora
View on GitHub
Training LLMs with QLoRA + FSDP
☆1,549Nov 9, 2024Updated last year
bitsandbytes-foundation / bitsandbytes
View on GitHub
Accessible large language models via k-bit quantization for PyTorch.
☆8,333Updated this week
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆901Nov 26, 2025Updated 7 months ago
HanGuo97 / flute
View on GitHub
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
☆391Apr 13, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
mit-han-lab / llm-awq
View on GitHub
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,592Jul 17, 2025Updated last year
wejoncy / QLLM
View on GitHub
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆190Mar 23, 2026Updated 3 months ago
AutoGPTQ / AutoGPTQ
View on GitHub
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆5,073Apr 11, 2025Updated last year
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
IST-DASLab / gptq
View on GitHub
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,334Mar 27, 2024Updated 2 years ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆418Nov 20, 2025Updated 8 months ago
Aaronhuang-778 / BiLLM
View on GitHub
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆235Jan 11, 2025Updated last year
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
IST-DASLab / qmoe
View on GitHub
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Nov 3, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / VPTQ
View on GitHub
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆682Apr 25, 2025Updated last year
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆850Mar 6, 2025Updated last year
fpgaminer / GPTQ-triton
View on GitHub
GPTQ inference Triton kernel
☆323May 18, 2023Updated 3 years ago
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,246Jun 17, 2026Updated last month
Vahe1994 / SpQR
View on GitHub
☆554Feb 8, 2026Updated 5 months ago
efeslab / Atom
View on GitHub
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆343Jul 2, 2024Updated 2 years ago
jiaweizzhao / GaLore
View on GitHub
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,699Oct 28, 2024Updated last year
yxli2123 / LoftQ
View on GitHub
☆234Jun 11, 2024Updated 2 years ago
NetEase-FuXi / EETQ
View on GitHub
Easy and Efficient Quantization for Transformers
☆205Mar 25, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
turboderp-org / exllamav2
View on GitHub
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,586Mar 4, 2026Updated 4 months ago
IST-DASLab / QUIK
View on GitHub
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
☆185Apr 16, 2024Updated 2 years ago
mit-han-lab / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,670Jul 12, 2024Updated 2 years ago
punica-ai / punica
View on GitHub
Serving multiple LoRA finetuned LLM as one
☆1,166May 8, 2024Updated 2 years ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,552Jul 13, 2026Updated last week
IST-DASLab / FP-Quant
View on GitHub
☆114Feb 26, 2026Updated 4 months ago
Cornell-RelaxML / qtip
View on GitHub
☆180Jun 22, 2025Updated last year