htqin/IR-QLoRA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/htqin/IR-QLoRA)

htqin / IR-QLoRA

[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

☆65

Alternatives and similar repositories for IR-QLoRA

Users that are interested in IR-QLoRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆39Sep 24, 2024Updated last year
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆89Jul 28, 2025Updated 9 months ago
htqin / QuantSR
View on GitHub
[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low…
☆52May 13, 2024Updated last year
yxli2123 / LoftQ
View on GitHub
☆235Jun 11, 2024Updated last year
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆171Nov 26, 2025Updated 5 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Macaronlin / LLaMA3-Quantization
View on GitHub
A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..
☆199Jan 14, 2025Updated last year
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated last month
htqin / DSG
View on GitHub
This project is the official implementation of our accepted IEEE TPAMI paper Diverse Sample Generation: Pushing the Limit of Data-free Qu…
☆15Feb 26, 2023Updated 3 years ago
Efficient-ML / Awesome-Efficient-AIGC
View on GitHub
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…
☆205Feb 10, 2025Updated last year
eltociear / qa-lora
View on GitHub
Pytorch code for paper QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
☆25Sep 27, 2023Updated 2 years ago
ThisisBillhe / EfficientDM
View on GitHub
[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…
☆71Jun 4, 2024Updated last year
ClubieDong / QAQ-KVCacheQuantization
View on GitHub
QAQ: Quality Adaptive Quantization for LLM KV Cache
☆54Mar 27, 2024Updated 2 years ago
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆390Feb 14, 2025Updated last year
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆69Mar 7, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆339Apr 10, 2026Updated 3 weeks ago
UNITES-Lab / MoE-Quantization
View on GitHub
Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"
☆30Jun 30, 2025Updated 10 months ago
bytedance / AffineQuant
View on GitHub
Official implementation of the ICLR 2024 paper AffineQuant
☆30Mar 30, 2024Updated 2 years ago
yuhuixu1993 / qa-lora
View on GitHub
Official PyTorch implementation of QA-LoRA
☆146Mar 13, 2024Updated 2 years ago
hahnyuan / PB-LLM
View on GitHub
PB-LLM: Partially Binarized Large Language Models
☆155Nov 20, 2023Updated 2 years ago
Aaronhuang-778 / BiLLM
View on GitHub
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆229Jan 11, 2025Updated last year
YouAreSpecialToMe / QST
View on GitHub
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
☆49Nov 5, 2024Updated last year
RUCAIBox / QuantizedEmpirical
View on GitHub
☆15Sep 24, 2023Updated 2 years ago
HanGuo97 / lq-lora
View on GitHub
☆129Jan 22, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wzhuang-xmu / LoSA
View on GitHub
[ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".
☆24Mar 16, 2025Updated last year
thu-nics / ViDiT-Q
View on GitHub
[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
☆156Mar 21, 2025Updated last year
Cornell-RelaxML / quip-sharp
View on GitHub
☆591Oct 29, 2024Updated last year
chenbong / PSS-Net
View on GitHub
☆17Jul 10, 2022Updated 3 years ago
htqin / BiBERT
View on GitHub
This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.
☆89Jun 2, 2023Updated 2 years ago
nasosger / MuToR
View on GitHub
[NeurIPS '25] Multi-Token Prediction Needs Registers
☆29Dec 14, 2025Updated 4 months ago
ModelTC / QLLM
View on GitHub
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆40Mar 11, 2024Updated 2 years ago
Qualcomm-AI-research / lr-qat
View on GitHub
☆52Nov 5, 2024Updated last year
Aaronhuang-778 / Mixture-Compressor-MoE
View on GitHub
[ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#
☆73Feb 12, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆390Nov 20, 2025Updated 5 months ago
SqueezeAILab / KVQuant
View on GitHub
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
☆421Aug 13, 2024Updated last year
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆896Nov 26, 2025Updated 5 months ago
aojunzz / DominoSearch
View on GitHub
☆19Dec 10, 2021Updated 4 years ago
XIANGLONGYAN / PBS2P
View on GitHub
PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"
☆13Mar 11, 2026Updated last month
sramshetty / ShortGPT
View on GitHub
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆78Apr 29, 2024Updated 2 years ago