[NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
β181Apr 24, 2026Updated last month
Alternatives and similar repositories for DuQuant
Users that are interested in DuQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ175Nov 26, 2025Updated 6 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β513Nov 26, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β217Nov 25, 2025Updated 6 months ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.β899Nov 26, 2025Updated 6 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β401Feb 14, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β45May 24, 2024Updated 2 years ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β138May 16, 2024Updated 2 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ341Jul 2, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optiβ¦β51Oct 21, 2023Updated 2 years ago
- β54Nov 5, 2024Updated last year
- β595Oct 29, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ341Apr 10, 2026Updated 2 months ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMsβ15Jul 18, 2024Updated last year
- An acceleration library that supports arbitrary bit-width combinatorial quantization operationsβ245Sep 30, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ236Aug 18, 2025Updated 9 months ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; η₯δΉοΌhttps://zhuanlan.zhihu.cβ¦β30Mar 5, 2025Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ161Mar 21, 2025Updated last year
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ787Aug 14, 2025Updated 10 months ago
- PB-LLM: Partially Binarized Large Language Modelsβ157Nov 20, 2023Updated 2 years ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationβ427Aug 13, 2024Updated last year
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"β13Mar 11, 2026Updated 3 months ago
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fittβ¦β93Apr 8, 2025Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,658Jul 12, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Lβ¦β49Oct 5, 2022Updated 3 years ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β92Mar 17, 2025Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β843Mar 6, 2025Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β47Nov 8, 2024Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ234Jan 11, 2025Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β102Jan 3, 2025Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ265Aug 9, 2025Updated 10 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cacheβ405Nov 20, 2025Updated 6 months ago
- β115Feb 26, 2026Updated 3 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β34Mar 28, 2025Updated last year
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 3 years ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.β723May 14, 2026Updated last month
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ127Jul 4, 2025Updated 11 months ago
- Code Repository of Evaluating Quantized Large Language Modelsβ134Sep 8, 2024Updated last year
- Official PyTorch implementation of Rethinking Guidance Information to Utilize Unlabeled Samples: A Label-Encoding Perspective.β19Sep 27, 2024Updated last year
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Modelsβ188Jan 1, 2025Updated last year