[NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
β180Apr 24, 2026Updated last week
Alternatives and similar repositories for DuQuant
Users that are interested in DuQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ171Nov 26, 2025Updated 5 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β506Nov 26, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β214Nov 25, 2025Updated 5 months ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.β896Nov 26, 2025Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β390Feb 14, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β45May 24, 2024Updated last year
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β137May 16, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ338Jul 2, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optiβ¦β51Oct 21, 2023Updated 2 years ago
- β52Nov 5, 2024Updated last year
- β591Oct 29, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ337Apr 10, 2026Updated 3 weeks ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMsβ15Jul 18, 2024Updated last year
- An acceleration library that supports arbitrary bit-width combinatorial quantization operationsβ244Sep 30, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ237Aug 18, 2025Updated 8 months ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; η₯δΉοΌhttps://zhuanlan.zhihu.cβ¦β30Mar 5, 2025Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ156Mar 21, 2025Updated last year
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ779Aug 14, 2025Updated 8 months ago
- PB-LLM: Partially Binarized Large Language Modelsβ155Nov 20, 2023Updated 2 years ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationβ420Aug 13, 2024Updated last year
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"β13Mar 11, 2026Updated last month
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,644Jul 12, 2024Updated last year
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Lβ¦β49Oct 5, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β88Mar 17, 2025Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β835Mar 6, 2025Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β46Nov 8, 2024Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ229Jan 11, 2025Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β101Jan 3, 2025Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ262Aug 9, 2025Updated 8 months ago
- β107Feb 26, 2026Updated 2 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cacheβ390Nov 20, 2025Updated 5 months ago
- β34Mar 28, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.β711Apr 1, 2026Updated last month
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ125Jul 4, 2025Updated 10 months ago
- Code Repository of Evaluating Quantized Large Language Modelsβ135Sep 8, 2024Updated last year
- Official PyTorch implementation of Rethinking Guidance Information to Utilize Unlabeled Samples: A Label-Encoding Perspective.β19Sep 27, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ29Aug 5, 2025Updated 8 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Modelsβ187Jan 1, 2025Updated last year