A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
☆2,327Jan 29, 2026Updated last month
Alternatives and similar repositories for Awesome-Model-Quantization
Users that are interested in Awesome-Model-Quantization are comparing it to the libraries listed below
Sorting:
- List of papers related to neural network quantization in recent AI conferences and journals.☆805Mar 27, 2025Updated 11 months ago
- Model Quantization Benchmark☆858Apr 20, 2025Updated 10 months ago
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆204Feb 10, 2025Updated last year
- Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.☆453May 15, 2023Updated 2 years ago
- Pytorch implementation of BRECQ, ICLR 2021☆290Aug 1, 2021Updated 4 years ago
- A curated list of neural network pruning resources.☆2,492Apr 4, 2024Updated last year
- Summary, Code for Deep Neural Network Quantization☆558Jun 14, 2025Updated 8 months ago
- [IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer☆360Apr 11, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,612Jul 12, 2024Updated last year
- micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantiz…☆2,271May 6, 2025Updated 9 months ago
- Unofficial implementation of LSQ-Net, a neural network quantization framework☆310May 8, 2024Updated last year
- [CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework☆281Dec 8, 2023Updated 2 years ago
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,785Mar 28, 2024Updated last year
- PyTorch implementation for the APoT quantization (ICLR 2020)☆283Dec 11, 2024Updated last year
- AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.☆2,563Updated this week
- Awesome LLM compression research papers and tools.☆1,780Updated this week
- Post-Training Quantization for Vision transformers.☆238Jul 19, 2022Updated 3 years ago
- The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quan…☆128Sep 23, 2025Updated 5 months ago
- PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.☆263Oct 3, 2023Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,261Mar 27, 2024Updated last year
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated last year
- [CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.☆3,258Sep 7, 2025Updated 5 months ago
- Brevitas: neural network quantization in PyTorch☆1,488Updated this week
- A simple network quantization demo using pytorch from scratch.☆542Jun 18, 2023Updated 2 years ago
- A curated list for Efficient Large Language Models☆1,954Jun 17, 2025Updated 8 months ago
- Awesome machine learning model compression research papers, quantization, tools, and learning material.☆540Sep 21, 2024Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆816Mar 6, 2025Updated 11 months ago
- [CVPR 2020] This project is the PyTorch implementation of our accepted CVPR 2020 paper : forward and backward information retention for a…☆181Mar 14, 2020Updated 5 years ago
- [CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision☆404Feb 26, 2021Updated 5 years ago
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.☆138Apr 28, 2022Updated 3 years ago
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆265Jan 29, 2023Updated 3 years ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆889Nov 26, 2025Updated 3 months ago
- The PyTorch implementation of Learned Step size Quantization (LSQ) in ICLR2020 (unofficial)☆139Nov 19, 2020Updated 5 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,441Jul 17, 2025Updated 7 months ago
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks☆68Nov 4, 2021Updated 4 years ago
- ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions. In ECCV 2020.☆263Nov 11, 2021Updated 4 years ago
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, …☆2,590Updated this week
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆483Nov 26, 2024Updated last year