Official implementation of the ICLR 2024 paper AffineQuant
☆28Mar 30, 2024Updated last year
Alternatives and similar repositories for AffineQuant
Users that are interested in AffineQuant are comparing it to the libraries listed below
Sorting:
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆37Aug 20, 2024Updated last year
- ☆25Oct 31, 2024Updated last year
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆485Nov 26, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆210Nov 25, 2025Updated 3 months ago
- ☆12Aug 26, 2022Updated 3 years ago
- ☆30Jul 22, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 3 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆374Feb 14, 2025Updated last year
- ☆15Sep 24, 2023Updated 2 years ago
- ☆36Mar 29, 2023Updated 2 years ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆890Nov 26, 2025Updated 3 months ago
- Scaling Sparse Fine-Tuning to Large Language Models☆18Jan 31, 2024Updated 2 years ago
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- [ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#☆68Feb 12, 2025Updated last year
- ☆33Mar 28, 2025Updated 11 months ago
- ☆28Nov 5, 2021Updated 4 years ago
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated last year
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆69Mar 7, 2024Updated last year
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric☆60Mar 23, 2023Updated 2 years ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Oct 3, 2024Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆454Jan 16, 2025Updated last year
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.☆680Nov 19, 2025Updated 3 months ago
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆88Apr 8, 2025Updated 10 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Apr 15, 2024Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,443Jul 17, 2025Updated 7 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆77Apr 29, 2024Updated last year
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.☆370Mar 21, 2024Updated last year
- Modification of daveshap/ChromaDB_Chatbot_Public that allows for end-users to customize the behavior/memories of the chatbot☆13Jun 30, 2023Updated 2 years ago
- ☆41Mar 28, 2024Updated last year
- Binary neural networks developed by Huawei Noah's Ark Lab☆29Feb 19, 2021Updated 5 years ago
- ACL 2023☆39Jun 6, 2023Updated 2 years ago
- Repository for go shared libraries (for now).☆11Dec 1, 2025Updated 3 months ago
- ☆23Oct 14, 2025Updated 4 months ago
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- A quantization algorithm for LLM☆148Jun 21, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆336Jul 2, 2024Updated last year
- ProxQuant: Quantized Neural Networks via Proximal Operators☆30Feb 19, 2019Updated 7 years ago