A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
☆2,334Jan 29, 2026Updated last month
Alternatives and similar repositories for Awesome-Model-Quantization
Users that are interested in Awesome-Model-Quantization are comparing it to the libraries listed below
Sorting:
- List of papers related to neural network quantization in recent AI conferences and journals.☆809Mar 27, 2025Updated 11 months ago
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆203Feb 10, 2025Updated last year
- Model Quantization Benchmark☆861Apr 20, 2025Updated 11 months ago
- Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.☆454May 15, 2023Updated 2 years ago
- Pytorch implementation of BRECQ, ICLR 2021☆292Aug 1, 2021Updated 4 years ago
- A curated list of neural network pruning resources.☆2,491Apr 4, 2024Updated last year
- Summary, Code for Deep Neural Network Quantization☆558Jun 14, 2025Updated 9 months ago
- [IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer☆357Apr 11, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,621Jul 12, 2024Updated last year
- micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantiz…☆2,271May 6, 2025Updated 10 months ago
- Unofficial implementation of LSQ-Net, a neural network quantization framework☆311May 8, 2024Updated last year
- [CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework☆282Dec 8, 2023Updated 2 years ago
- PyTorch implementation for the APoT quantization (ICLR 2020)☆286Dec 11, 2024Updated last year
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,787Mar 28, 2024Updated last year
- Awesome LLM compression research papers and tools.☆1,789Feb 23, 2026Updated 3 weeks ago
- Post-Training Quantization for Vision transformers.☆240Jul 19, 2022Updated 3 years ago
- The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quan…☆128Sep 23, 2025Updated 5 months ago
- AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.☆2,566Mar 14, 2026Updated last week
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,266Mar 27, 2024Updated last year
- Awesome machine learning model compression research papers, quantization, tools, and learning material.☆539Sep 21, 2024Updated last year
- [CVPR 2020] This project is the PyTorch implementation of our accepted CVPR 2020 paper : forward and backward information retention for a…☆180Mar 14, 2020Updated 6 years ago
- A curated list for Efficient Large Language Models☆1,967Jun 17, 2025Updated 9 months ago
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.☆138Apr 28, 2022Updated 3 years ago
- PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.☆263Oct 3, 2023Updated 2 years ago
- A simple network quantization demo using pytorch from scratch.☆541Jun 18, 2023Updated 2 years ago
- Brevitas: neural network quantization in PyTorch☆1,502Updated this week
- [CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.☆3,267Sep 7, 2025Updated 6 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆818Mar 6, 2025Updated last year
- [CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision☆403Feb 26, 2021Updated 5 years ago
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric☆60Mar 23, 2023Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,463Jul 17, 2025Updated 8 months ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆890Nov 26, 2025Updated 3 months ago
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- BitSplit Post-trining Quantization☆50Dec 20, 2021Updated 4 years ago
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆266Jan 29, 2023Updated 3 years ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Jun 2, 2023Updated 2 years ago
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. In ECCV 2…☆186Mar 28, 2021Updated 4 years ago
- Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks☆68Nov 4, 2021Updated 4 years ago