snu-mllab/GuidedQuant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/snu-mllab/GuidedQuant)

snu-mllab / GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)

☆53

Alternatives and similar repositories for GuidedQuant

Users that are interested in GuidedQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Xingyu-Zheng / FOEM
View on GitHub
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
☆16Apr 16, 2026Updated 3 months ago
utkarsh-dmx / project-resq
View on GitHub
☆35Mar 28, 2025Updated last year
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆92Jul 28, 2025Updated 11 months ago
snu-mllab / LayerMerge
View on GitHub
Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)
☆31Apr 13, 2026Updated 3 months ago
wlfeng0509 / Q-VDiT
View on GitHub
(ICML-2025) Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers
☆21Aug 13, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago
snu-mllab / KVzip
View on GitHub
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆225Feb 11, 2026Updated 5 months ago
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆94Apr 8, 2025Updated last year
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆417Feb 14, 2025Updated last year
lhxcs / DVD-Quant
View on GitHub
☆17Oct 5, 2025Updated 9 months ago
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
AXERA-TECH / Qwen2.5-VL-3B-Instruct.axera
View on GitHub
Demo for Qwen2.5-VL-3B-Instruct on Axera device.
☆16Sep 3, 2025Updated 10 months ago
thooton / aspen
View on GitHub
Personal voice assistant, with voice interruption and Twilio support
☆18Feb 24, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆41Sep 24, 2024Updated last year
kssteven418 / SqueezeLLM-gradients
View on GitHub
☆21Feb 5, 2024Updated 2 years ago
Janghyun1230 / FastKVzip
View on GitHub
Accurate and fast KV cache compression with a gating mechanism
☆27Apr 5, 2026Updated 3 months ago
INT-FlashAttention2024 / INT-FlashAttention
View on GitHub
☆91Jan 23, 2025Updated last year
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆62Aug 9, 2024Updated last year
IST-DASLab / FP-Quant
View on GitHub
☆115Feb 26, 2026Updated 5 months ago
Aaronhuang-778 / BiLLM
View on GitHub
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆235Jan 11, 2025Updated last year
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NathanGodey / qfilters
View on GitHub
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆34Mar 7, 2025Updated last year
snu-mllab / Neural-Relation-Graph
View on GitHub
Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)
☆15Dec 4, 2023Updated 2 years ago
pprp / STBLLM
View on GitHub
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
☆20Jun 3, 2025Updated last year
Dao-AILab / gemm-cublas
View on GitHub
☆22May 5, 2025Updated last year
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
mert-cemri / autoevolve
View on GitHub
☆24Dec 6, 2025Updated 7 months ago
Cornell-RelaxML / QuIP
View on GitHub
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆401Feb 24, 2024Updated 2 years ago
snu-mllab / Context-Memory
View on GitHub
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆63Apr 18, 2024Updated 2 years ago
snu-mllab / Targeted-Cause-Discovery
View on GitHub
[TMLR'25] Official implementation for "Large-Scale Targeted Cause Discovery via Learning from Simulated Data"
☆28Sep 30, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆130Jul 4, 2025Updated last year
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆421Nov 20, 2025Updated 8 months ago
zhebrak / agtap
View on GitHub
Zero-instrumentation LLM API and MCP tracer for your agents powered by eBPF — latency, tokens, and tool use in realtime
☆18Mar 16, 2026Updated 4 months ago
MIRALab-USTC / AI4LogicSynthesis-PruneX
View on GitHub
☆10Aug 22, 2023Updated 2 years ago
wejoncy / QLLM
View on GitHub
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆190Mar 23, 2026Updated 4 months ago
fatemehpesaran310 / lpoi
View on GitHub
Official PyTorch implementation of "LPOI: Listwise Preference Optimization for Vision Language Models" (ACL 2025 Main)
☆16Jun 19, 2026Updated last month
LinB203 / FSDP-Training
View on GitHub
Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA
☆32Nov 27, 2025Updated 7 months ago