Qualcomm-AI-research/lr-qat

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Qualcomm-AI-research/lr-qat)

Qualcomm-AI-research / lr-qat

☆54

Alternatives and similar repositories for lr-qat

Users that are interested in lr-qat are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
utkarsh-dmx / project-resq
View on GitHub
☆35Mar 28, 2025Updated last year
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆39Aug 20, 2024Updated last year
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
ziplab / QLLM
View on GitHub
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆31Mar 12, 2024Updated 2 years ago
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆326Mar 4, 2025Updated last year
Qualcomm-AI-research / pruning-vs-quantization
View on GitHub
☆26Mar 1, 2024Updated 2 years ago
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
yxli2123 / LoftQ
View on GitHub
☆234Jun 11, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
Qualcomm-AI-research / FP8-quantization
View on GitHub
☆172Mar 9, 2023Updated 3 years ago
ylsung / rsq
View on GitHub
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆23Mar 25, 2026Updated 4 months ago
hahnyuan / PB-LLM
View on GitHub
PB-LLM: Partially Binarized Large Language Models
☆158Nov 20, 2023Updated 2 years ago
INT-FlashAttention2024 / INT-FlashAttention
View on GitHub
☆91Jan 23, 2025Updated last year
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆158Aug 21, 2025Updated 11 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆186Apr 24, 2026Updated 3 months ago
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆39Oct 3, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
eltociear / qa-lora
View on GitHub
Pytorch code for paper QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
☆25Sep 27, 2023Updated 2 years ago
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆41Sep 24, 2024Updated last year
PositionalHidden / PositionalHidden
View on GitHub
To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …
☆12Jun 18, 2024Updated 2 years ago
AozhongZhang / MagR
View on GitHub
☆16Jun 22, 2025Updated last year
ngocbh / trimkv
View on GitHub
[TrimKV] Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs - [DBTrimKV] Make Each Token Count: Towards Improving Lo…
☆15May 13, 2026Updated 2 months ago
foundation-model-stack / fms-model-optimizer
View on GitHub
FMS Model Optimizer is a framework for developing reduced precision neural network models.
☆21Jun 24, 2026Updated last month
shawnricecake / squant
View on GitHub
[ICCAD 2025] Squant
☆15Jul 3, 2025Updated last year
ByteDance-Seed / decoupleQ
View on GitHub
A quantization algorithm for LLM
☆150Jun 21, 2024Updated 2 years ago
IST-DASLab / HALO
View on GitHub
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…
☆31Feb 17, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Qualcomm-AI-research / oscillations-qat
View on GitHub
☆81Jul 21, 2022Updated 4 years ago
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆417Feb 14, 2025Updated last year
ChengZhang-98 / LQER
View on GitHub
Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"
☆19Jul 11, 2024Updated 2 years ago
PingchengDong / GQA-LUT
View on GitHub
The official implementation of the DAC 2024 paper GQA-LUT
☆24Dec 20, 2024Updated last year
Janghyun1230 / FastKVzip
View on GitHub
Accurate and fast KV cache compression with a gating mechanism
☆27Apr 5, 2026Updated 3 months ago
nbasyl / LLM-FP4
View on GitHub
The official implementation of the EMNLP 2023 paper LLM-FP4
☆225Dec 15, 2023Updated 2 years ago
yuhuixu1993 / qa-lora
View on GitHub
Official PyTorch implementation of QA-LoRA
☆147Mar 13, 2024Updated 2 years ago