A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
☆2,389May 11, 2026Updated 3 weeks ago
Alternatives and similar repositories for Awesome-Model-Quantization
Users that are interested in Awesome-Model-Quantization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- List of papers related to neural network quantization in recent AI conferences and journals.☆828Mar 27, 2025Updated last year
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆206Feb 10, 2025Updated last year
- Model Quantization Benchmark☆869Apr 20, 2025Updated last year
- Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.☆461May 15, 2023Updated 3 years ago
- Pytorch implementation of BRECQ, ICLR 2021☆300Aug 1, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A curated list of neural network pruning resources.☆2,492Apr 4, 2024Updated 2 years ago
- [IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer☆361Apr 11, 2023Updated 3 years ago
- Summary, Code for Deep Neural Network Quantization☆563May 26, 2026Updated 2 weeks ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,658Jul 12, 2024Updated last year
- Unofficial implementation of LSQ-Net, a neural network quantization framework☆315May 8, 2024Updated 2 years ago
- micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantiz…☆2,268May 6, 2025Updated last year
- [CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework☆280Dec 8, 2023Updated 2 years ago
- PyTorch implementation for the APoT quantization (ICLR 2020)☆287Dec 11, 2024Updated last year
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,801Mar 28, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quan…☆131Sep 23, 2025Updated 8 months ago
- Awesome LLM compression research papers and tools.☆1,841Feb 23, 2026Updated 3 months ago
- Post-Training Quantization for Vision transformers.☆242Jul 19, 2022Updated 3 years ago
- AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.☆2,634Updated this week
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,318Mar 27, 2024Updated 2 years ago
- Awesome machine learning model compression research papers, quantization, tools, and learning material.☆543Sep 21, 2024Updated last year
- [CVPR 2020] This project is the PyTorch implementation of our accepted CVPR 2020 paper : forward and backward information retention for a…☆181Mar 14, 2020Updated 6 years ago
- A curated list for Efficient Large Language Models☆2,018Jun 17, 2025Updated 11 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.☆138Apr 28, 2022Updated 4 years ago
- A simple network quantization demo using pytorch from scratch.☆542Jun 18, 2023Updated 2 years ago
- PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.☆264Oct 3, 2023Updated 2 years ago
- Brevitas: neural network quantization in PyTorch☆1,536Updated this week
- [CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.☆3,312Sep 7, 2025Updated 9 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆841Mar 6, 2025Updated last year
- [CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision☆410Feb 26, 2021Updated 5 years ago
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric☆61Mar 23, 2023Updated 3 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,556Jul 17, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆899Nov 26, 2025Updated 6 months ago
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 3 years ago
- BitSplit Post-trining Quantization☆49Dec 20, 2021Updated 4 years ago
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆268Jan 29, 2023Updated 3 years ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Jun 2, 2023Updated 3 years ago
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. In ECCV 2…☆185Mar 28, 2021Updated 5 years ago
- Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks☆68Nov 4, 2021Updated 4 years ago