[NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
β180Apr 24, 2026Updated last month
Alternatives and similar repositories for DuQuant
Users that are interested in DuQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ174Nov 26, 2025Updated 5 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β509Nov 26, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β216Nov 25, 2025Updated 5 months ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.β897Nov 26, 2025Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β395Feb 14, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β45May 24, 2024Updated 2 years ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β138May 16, 2024Updated 2 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ340Jul 2, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optiβ¦β51Oct 21, 2023Updated 2 years ago
- β52Nov 5, 2024Updated last year
- β590Oct 29, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ339Apr 10, 2026Updated last month
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMsβ15Jul 18, 2024Updated last year
- An acceleration library that supports arbitrary bit-width combinatorial quantization operationsβ244Sep 30, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ236Aug 18, 2025Updated 9 months ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; η₯δΉοΌhttps://zhuanlan.zhihu.cβ¦β30Mar 5, 2025Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ160Mar 21, 2025Updated last year
- PB-LLM: Partially Binarized Large Language Modelsβ156Nov 20, 2023Updated 2 years ago
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ783Aug 14, 2025Updated 9 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationβ424Aug 13, 2024Updated last year
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"β13Mar 11, 2026Updated 2 months ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,650Jul 12, 2024Updated last year
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Lβ¦β49Oct 5, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β89Mar 17, 2025Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β838Mar 6, 2025Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β47Nov 8, 2024Updated last year
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fittβ¦β92Apr 8, 2025Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ233Jan 11, 2025Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β101Jan 3, 2025Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ264Aug 9, 2025Updated 9 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cacheβ399Nov 20, 2025Updated 6 months ago
- β115Feb 26, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β34Mar 28, 2025Updated last year
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 3 years ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.β715May 14, 2026Updated last week
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ125Jul 4, 2025Updated 10 months ago
- Code Repository of Evaluating Quantized Large Language Modelsβ135Sep 8, 2024Updated last year
- Official PyTorch implementation of Rethinking Guidance Information to Utilize Unlabeled Samples: A Label-Encoding Perspective.β19Sep 27, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ29Aug 5, 2025Updated 9 months ago