[NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
β179Oct 3, 2024Updated last year
Alternatives and similar repositories for DuQuant
Users that are interested in DuQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ172Nov 26, 2025Updated 3 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β492Nov 26, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β211Nov 25, 2025Updated 4 months ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.β892Nov 26, 2025Updated 3 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β380Feb 14, 2025Updated last year
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β47May 24, 2024Updated last year
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β134May 16, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ336Jul 2, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optiβ¦β51Oct 21, 2023Updated 2 years ago
- β52Nov 5, 2024Updated last year
- β581Oct 29, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ330Nov 26, 2025Updated 3 months ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMsβ15Jul 18, 2024Updated last year
- An acceleration library that supports arbitrary bit-width combinatorial quantization operationsβ242Sep 30, 2024Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ236Aug 18, 2025Updated 7 months ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; η₯δΉοΌhttps://zhuanlan.zhihu.cβ¦β29Mar 5, 2025Updated last year
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ764Aug 14, 2025Updated 7 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ154Mar 21, 2025Updated last year
- PB-LLM: Partially Binarized Large Language Modelsβ156Nov 20, 2023Updated 2 years ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationβ408Aug 13, 2024Updated last year
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"β13Mar 11, 2026Updated 2 weeks ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,625Jul 12, 2024Updated last year
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β83Mar 17, 2025Updated last year
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Lβ¦β49Oct 5, 2022Updated 3 years ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β821Mar 6, 2025Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β99Jan 3, 2025Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β46Nov 8, 2024Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ229Jan 11, 2025Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ262Aug 9, 2025Updated 7 months ago
- Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMsβ54Mar 13, 2026Updated last week
- β102Feb 26, 2026Updated 3 weeks ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cacheβ363Nov 20, 2025Updated 4 months ago
- β34Mar 28, 2025Updated 11 months ago
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.β691Mar 11, 2026Updated 2 weeks ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ123Jul 4, 2025Updated 8 months ago
- Code Repository of Evaluating Quantized Large Language Modelsβ135Sep 8, 2024Updated last year
- Official PyTorch implementation of Rethinking Guidance Information to Utilize Unlabeled Samples: A Label-Encoding Perspective.β19Sep 27, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ28Aug 5, 2025Updated 7 months ago