StiphyJay/MQuant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/StiphyJay/MQuant)

StiphyJay / MQuant

[ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

☆44

Alternatives and similar repositories for MQuant

Users that are interested in MQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thu-nics / MBQ
View on GitHub
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆93Mar 17, 2025Updated last year
ChangyuanWang17 / QVLM
View on GitHub
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
☆103Jan 3, 2025Updated last year
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆94Apr 8, 2025Updated last year
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆52Oct 21, 2023Updated 2 years ago
MAC-AutoML / Awesome-Efficient-Large-Models
View on GitHub
A list of awesome papers on compression and acceleration of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs).
☆16May 12, 2026Updated 2 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
DFQ-Dojo / dfq-toolkit
View on GitHub
[ICCV 2025] Task-Specific Zero-shot Quantization-Aware Training for Object Detection
☆28Sep 26, 2025Updated 10 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago
anminliu / VecAttention
View on GitHub
[CVPR2026] VecAttention: Vector-wise Sparse Attention for Accelerating Long-Context Inference
☆20May 27, 2026Updated 2 months ago
Yeyke / HBLLM
View on GitHub
[NeurIPS 2025 (spotlight)] HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
☆16Dec 17, 2025Updated 7 months ago
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆93Jul 28, 2025Updated last year
SAI-Lab-NYU / QSVD
View on GitHub
This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value …
☆28May 16, 2026Updated 2 months ago
Xingyu-Zheng / FOEM
View on GitHub
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
☆16Apr 16, 2026Updated 3 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ucas-xiang / QIG
View on GitHub
[CVPR 2026] Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
☆23Jun 21, 2026Updated last month
GoatWu / APHQ-ViT
View on GitHub
[CVPR 2025] APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
☆44Apr 7, 2025Updated last year
shijiew / QwenSpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆15Mar 20, 2025Updated last year
chengtao-lv / PTQ4SAM
View on GitHub
[CVPR 2024] PTQ4SAM: Post-Training Quantization for Segment Anything
☆85Jun 26, 2024Updated 2 years ago
taishan1994 / LLM-Quantization
View on GitHub
记录量化LLM中的总结。
☆79Jan 8, 2026Updated 6 months ago
lhxcs / DVD-Quant
View on GitHub
☆17Oct 5, 2025Updated 9 months ago
casys-kaist / oaken
View on GitHub
Artifact for Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
☆18May 9, 2025Updated last year
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
deep-optimization / SliderQuant
View on GitHub
The official project website of "SliderQuant: Accurate Post-Training Quantization for LLMs" (accepted to ICLR 2026).
☆25Jun 15, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
pittisl / mPnP-LLM
View on GitHub
Code for paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI"
☆13Jan 19, 2024Updated 2 years ago
rhmaaa / comet-25
View on GitHub
☆15Feb 11, 2025Updated last year
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆39Aug 20, 2024Updated last year
A-suozhang / MixDQ
View on GitHub
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
☆14Nov 27, 2024Updated last year
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated 3 months ago
LeanModels / LeanQuant
View on GitHub
Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"
☆29Mar 2, 2025Updated last year
IPADS-SAI / WaferAI-SIM
View on GitHub
The wafer-native AI accelerator simulation platform and inference engine.
☆61Jan 1, 2026Updated 6 months ago
utkarsh-dmx / project-resq
View on GitHub
☆35Mar 28, 2025Updated last year
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆418Feb 14, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
A-suozhang / ViDiT-Q
View on GitHub
☆15Mar 21, 2025Updated last year
chowy333 / WMKPN-Burst_Super-Resolution
View on GitHub
Official Pytorch Implementation for Weighted Multi-Kernel Prediction Network for Burst Image Super-resolution.
☆10Nov 9, 2021Updated 4 years ago
MPSC-UMBC / Efficient-Vision-Language-Models-A-Survey
View on GitHub
[2025] Efficient Vision Language Models: A Survey
☆52Jul 14, 2025Updated last year
ModelTC / QVGen
View on GitHub
[ICLR 2026] This is the official PyTorch implementation of "QVGen: Pushing the Limit of Quantized Video Generative Models".
☆32Feb 11, 2026Updated 5 months ago
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆903Nov 26, 2025Updated 8 months ago
insuhan / calibquant
View on GitHub
☆21Apr 3, 2025Updated last year
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆187Apr 24, 2026Updated 3 months ago