A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
☆2,360Apr 25, 2026Updated this week
Alternatives and similar repositories for Awesome-Model-Quantization
Users that are interested in Awesome-Model-Quantization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- List of papers related to neural network quantization in recent AI conferences and journals.☆820Mar 27, 2025Updated last year
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆205Feb 10, 2025Updated last year
- Model Quantization Benchmark☆865Apr 20, 2025Updated last year
- Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.☆461May 15, 2023Updated 2 years ago
- Pytorch implementation of BRECQ, ICLR 2021☆297Aug 1, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A curated list of neural network pruning resources.☆2,493Apr 4, 2024Updated 2 years ago
- Summary, Code for Deep Neural Network Quantization☆561Jun 14, 2025Updated 10 months ago
- [IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer☆361Apr 11, 2023Updated 3 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,641Jul 12, 2024Updated last year
- micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantiz…☆2,271May 6, 2025Updated 11 months ago
- Unofficial implementation of LSQ-Net, a neural network quantization framework☆312May 8, 2024Updated last year
- [CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework☆282Dec 8, 2023Updated 2 years ago
- PyTorch implementation for the APoT quantization (ICLR 2020)☆286Dec 11, 2024Updated last year
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,793Mar 28, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quan…☆130Sep 23, 2025Updated 7 months ago
- Awesome LLM compression research papers and tools.☆1,824Feb 23, 2026Updated 2 months ago
- Post-Training Quantization for Vision transformers.☆242Jul 19, 2022Updated 3 years ago
- AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.☆2,604Updated this week
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,301Mar 27, 2024Updated 2 years ago
- Awesome machine learning model compression research papers, quantization, tools, and learning material.☆542Sep 21, 2024Updated last year
- [CVPR 2020] This project is the PyTorch implementation of our accepted CVPR 2020 paper : forward and backward information retention for a…☆180Mar 14, 2020Updated 6 years ago
- A curated list for Efficient Large Language Models☆1,993Jun 17, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.☆138Apr 28, 2022Updated 4 years ago
- A simple network quantization demo using pytorch from scratch.☆542Jun 18, 2023Updated 2 years ago
- PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.☆263Oct 3, 2023Updated 2 years ago
- Brevitas: neural network quantization in PyTorch☆1,524Apr 23, 2026Updated last week
- [CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.☆3,296Sep 7, 2025Updated 7 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆834Mar 6, 2025Updated last year
- [CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision☆408Feb 26, 2021Updated 5 years ago
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric☆61Mar 23, 2023Updated 3 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,512Jul 17, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆891Nov 26, 2025Updated 5 months ago
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- BitSplit Post-trining Quantization☆49Dec 20, 2021Updated 4 years ago
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆268Jan 29, 2023Updated 3 years ago
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. In ECCV 2…☆186Mar 28, 2021Updated 5 years ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Jun 2, 2023Updated 2 years ago
- Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks☆68Nov 4, 2021Updated 4 years ago