ModelTC/Outlier_Suppression_Plus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ModelTC/Outlier_Suppression_Plus)

ModelTC / Outlier_Suppression_Plus

Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

☆52

Alternatives and similar repositories for Outlier_Suppression_Plus

Users that are interested in Outlier_Suppression_Plus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wimh966 / outlier_suppression
View on GitHub
The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…
☆49Oct 5, 2022Updated 3 years ago
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 7 months ago
hahnyuan / RPTQ4LLM
View on GitHub
Reorder-based post-training quantization for large language model
☆199May 17, 2023Updated 3 years ago
moranshkolnik / RobustQuantization
View on GitHub
source code of the paper: Robust Quantization: One Model to Rule Them All
☆42Mar 24, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
aiha-lab / MX-QLLM
View on GitHub
LLM Inference with Microscaling Format
☆35Nov 12, 2024Updated last year
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆523Nov 26, 2024Updated last year
StiphyJay / MQuant
View on GitHub
[ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
☆44Aug 13, 2025Updated 11 months ago
AozhongZhang / MagR
View on GitHub
☆16Jun 22, 2025Updated last year
ziplab / QLLM
View on GitHub
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆31Mar 12, 2024Updated 2 years ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 7 months ago
wimh966 / QDrop
View on GitHub
The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quan…
☆131Sep 23, 2025Updated 9 months ago
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆38Aug 20, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ModelTC / quant_horizon
View on GitHub
☆11Jan 10, 2025Updated last year
nbasyl / LLM-FP4
View on GitHub
The official implementation of the EMNLP 2023 paper LLM-FP4
☆224Dec 15, 2023Updated 2 years ago
mit-han-lab / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,670Jul 12, 2024Updated 2 years ago
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆325Mar 4, 2025Updated last year
Adamdad / Samesame
View on GitHub
An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…
☆10Dec 18, 2019Updated 6 years ago
IST-DASLab / QUIK
View on GitHub
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
☆185Apr 16, 2024Updated 2 years ago
houlu369 / Loss-aware-weight-quantization
View on GitHub
Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"
☆27Oct 24, 2019Updated 6 years ago
IST-DASLab / OBC
View on GitHub
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆132Jul 11, 2023Updated 3 years ago
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆39Oct 3, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ModelTC / awesome-lm-system
View on GitHub
Summary of system papers/frameworks/codes/tools on training or serving large model
☆57Dec 17, 2023Updated 2 years ago
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆901Nov 26, 2025Updated 7 months ago
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆186Apr 24, 2026Updated 2 months ago
thu-nics / MBQ
View on GitHub
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆93Mar 17, 2025Updated last year
deJQK / FracBits
View on GitHub
Neural Network Quantization With Fractional Bit-widths
☆11Feb 19, 2021Updated 5 years ago
ThisisBillhe / torch_quantizer
View on GitHub
torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.
☆25Mar 29, 2024Updated 2 years ago
ziplab / QTool
View on GitHub
Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)
☆73Oct 7, 2021Updated 4 years ago
hahnyuan / PB-LLM
View on GitHub
PB-LLM: Partially Binarized Large Language Models
☆158Nov 20, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MingSun-Tse / Why-the-State-of-Pruning-so-Confusing
View on GitHub
[Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…
☆41Sep 9, 2025Updated 10 months ago
facebookresearch / Ternary_Binary_Transformer
View on GitHub
ACL 2023
☆39Jun 6, 2023Updated 3 years ago
RUCAIBox / QuantizedEmpirical
View on GitHub
☆15Sep 24, 2023Updated 2 years ago
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆157Aug 21, 2025Updated 11 months ago
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆415Feb 14, 2025Updated last year
42Shawn / PTQ4DM
View on GitHub
Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)
☆146Apr 1, 2023Updated 3 years ago
yhhhli / BRECQ
View on GitHub
Pytorch implementation of BRECQ, ICLR 2021
☆300Aug 1, 2021Updated 4 years ago