zhangsichengsjtu/AFPQ

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhangsichengsjtu/AFPQ)

zhangsichengsjtu / AFPQ

AFPQ code implementation

☆23

Alternatives and similar repositories for AFPQ

Users that are interested in AFPQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
RUCAIBox / QuantizedEmpirical
View on GitHub
☆15Sep 24, 2023Updated 2 years ago
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆52Oct 21, 2023Updated 2 years ago
aiha-lab / TSLD
View on GitHub
[NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
☆18Dec 6, 2023Updated 2 years ago
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆39Aug 20, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
lliai / EMQ-series
View on GitHub
[ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
☆29Dec 6, 2023Updated 2 years ago
fmfi-compbio / admm-pruning
View on GitHub
☆30Jul 22, 2024Updated 2 years ago
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆41Sep 24, 2024Updated last year
hahnyuan / PB-LLM
View on GitHub
PB-LLM: Partially Binarized Large Language Models
☆158Nov 20, 2023Updated 2 years ago
nbasyl / LLM-FP4
View on GitHub
The official implementation of the EMNLP 2023 paper LLM-FP4
☆225Dec 15, 2023Updated 2 years ago
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆62Aug 9, 2024Updated last year
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
ICTMCG / SDTM
View on GitHub
Official repository for "Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration", which has been …
☆17Sep 29, 2025Updated 9 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
ylsung / rsq
View on GitHub
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆23Mar 25, 2026Updated 4 months ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
JunyiWuCode / QuantCache
View on GitHub
[ICCV 2025] QuantCache：Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation
☆18Sep 26, 2025Updated 10 months ago
HuangOwen / Quantization-Variation
View on GitHub
[TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisio…
☆49Sep 27, 2024Updated last year
Adamdad / Samesame
View on GitHub
An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…
☆10Dec 18, 2019Updated 6 years ago
aiha-lab / MX-QLLM
View on GitHub
LLM Inference with Microscaling Format
☆35Nov 12, 2024Updated last year
Qualcomm-AI-research / FP8-quantization
View on GitHub
☆172Mar 9, 2023Updated 3 years ago
Qualcomm-AI-research / lr-qat
View on GitHub
☆54Nov 5, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xjjxmu / QSLAW
View on GitHub
The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…
☆14Dec 7, 2024Updated last year
kyrie-23 / linear_task_arithmetic
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
MAC-AutoML / OMPQ
View on GitHub
☆25Dec 11, 2021Updated 4 years ago
thu-nics / MBQ
View on GitHub
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆93Mar 17, 2025Updated last year
chihhuiho / yoro
View on GitHub
☆16Nov 14, 2022Updated 3 years ago
Xingyu-Zheng / FOEM
View on GitHub
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
☆16Apr 16, 2026Updated 3 months ago
yxli2123 / LoftQ
View on GitHub
☆234Jun 11, 2024Updated 2 years ago
lliai / DisWOT-CVPR2023
View on GitHub
☆28Nov 29, 2023Updated 2 years ago
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆158Aug 21, 2025Updated 11 months ago
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆39Oct 3, 2023Updated 2 years ago
IST-DASLab / HALO
View on GitHub
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…
☆31Feb 17, 2025Updated last year
IBM / pfloat
View on GitHub
A 8-/16-/32-/64-bit floating point number family
☆16Feb 4, 2022Updated 4 years ago
utkarsh-dmx / project-resq
View on GitHub
☆35Mar 28, 2025Updated last year
IST-DASLab / RoSA
View on GitHub
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆46May 20, 2026Updated 2 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago