[NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
β180Oct 3, 2024Updated last year
Alternatives and similar repositories for DuQuant
Users that are interested in DuQuant are comparing it to the libraries listed below
Sorting:
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ172Nov 26, 2025Updated 3 months ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β210Nov 25, 2025Updated 3 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β485Nov 26, 2024Updated last year
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β47May 24, 2024Updated last year
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β373Feb 14, 2025Updated last year
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.β890Nov 26, 2025Updated 3 months ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β134May 16, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optiβ¦β50Oct 21, 2023Updated 2 years ago
- β52Nov 5, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ336Jul 2, 2024Updated last year
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ761Aug 14, 2025Updated 6 months ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operationsβ241Sep 30, 2024Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMsβ15Jul 18, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ327Nov 26, 2025Updated 3 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cacheβ358Nov 20, 2025Updated 3 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationβ402Aug 13, 2024Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ151Mar 21, 2025Updated 11 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Lβ¦β49Oct 5, 2022Updated 3 years ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; η₯δΉοΌhttps://zhuanlan.zhihu.cβ¦β29Mar 5, 2025Updated last year
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantizationβ14Nov 27, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,612Jul 12, 2024Updated last year
- PB-LLM: Partially Binarized Large Language Modelsβ156Nov 20, 2023Updated 2 years ago
- β579Oct 29, 2024Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β45Nov 8, 2024Updated last year
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.β680Nov 19, 2025Updated 3 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ260Aug 9, 2025Updated 6 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ228Jan 11, 2025Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β817Mar 6, 2025Updated 11 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ122Jul 4, 2025Updated 8 months ago
- Code Repository of Evaluating Quantized Large Language Modelsβ136Sep 8, 2024Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β98Jan 3, 2025Updated last year
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Modelsβ49Nov 5, 2024Updated last year
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Modelβ¦β69Mar 7, 2024Updated last year
- The official implementation of the EMNLP 2023 paper LLM-FP4β220Dec 15, 2023Updated 2 years ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"β13Sep 28, 2025Updated 5 months ago
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- β23Nov 26, 2024Updated last year
- LLM Inference with Microscaling Formatβ34Nov 12, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ57Jun 26, 2025Updated 8 months ago