BrotherHappy/OSTQuant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BrotherHappy/OSTQuant)

BrotherHappy / OSTQuant

[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

☆93

Alternatives and similar repositories for OSTQuant

Users that are interested in OSTQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 7 months ago
utkarsh-dmx / project-resq
View on GitHub
☆35Mar 28, 2025Updated last year
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆415Feb 14, 2025Updated last year
FFTYYY / RaanA
View on GitHub
Implementation of "RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm"
☆17Apr 11, 2025Updated last year
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ChengZhang-98 / LQER
View on GitHub
Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"
☆19Jul 11, 2024Updated 2 years ago
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆92Jul 28, 2025Updated 11 months ago
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
Xingyu-Zheng / FOEM
View on GitHub
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
☆16Apr 16, 2026Updated 3 months ago
pprp / STBLLM
View on GitHub
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
☆20Jun 3, 2025Updated last year
StiphyJay / MQuant
View on GitHub
[ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
☆44Aug 13, 2025Updated 11 months ago
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆523Nov 26, 2024Updated last year
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
Dao-AILab / fast-hadamard-transform
View on GitHub
Fast Hadamard transform in CUDA, with a PyTorch interface
☆340Mar 10, 2026Updated 4 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ylsung / rsq
View on GitHub
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆23Mar 25, 2026Updated 3 months ago
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆157Aug 21, 2025Updated 11 months ago
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆186Apr 24, 2026Updated 2 months ago
NathanGodey / qfilters
View on GitHub
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆34Mar 7, 2025Updated last year
hahnyuan / PB-LLM
View on GitHub
PB-LLM: Partially Binarized Large Language Models
☆158Nov 20, 2023Updated 2 years ago
NVlabs / EoRA
View on GitHub
[ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
☆49Apr 21, 2026Updated 3 months ago
ChengZhang-98 / QERA
View on GitHub
Official implementation of the ICLR'25 paper "QERA: an Analytical Framework for Quantization Error Reconstruction".
☆14Feb 4, 2025Updated last year
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 7 months ago
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆39Oct 3, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
IST-DASLab / HALO
View on GitHub
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…
☆31Feb 17, 2025Updated last year
ziplab / QLLM
View on GitHub
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆31Mar 12, 2024Updated 2 years ago
pprp / Awesome-LLM-Quantization
View on GitHub
Awesome list for LLM quantization
☆432Apr 20, 2026Updated 3 months ago
Cornell-RelaxML / qtip
View on GitHub
☆180Jun 22, 2025Updated last year
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆52Oct 21, 2023Updated 2 years ago
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆62Aug 9, 2024Updated last year
LeanModels / LeanQuant
View on GitHub
Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"
☆29Mar 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Juanerx / Q-DiT
View on GitHub
[CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
☆79Sep 3, 2024Updated last year
zhengchen3 / HLS_Transformer
View on GitHub
c++ version of ViT
☆12Nov 13, 2022Updated 3 years ago
66RING / CritiPrefill
View on GitHub
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
☆17Sep 15, 2024Updated last year
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆901Nov 26, 2025Updated 7 months ago
INT-FlashAttention2024 / INT-FlashAttention
View on GitHub
☆91Jan 23, 2025Updated last year
Qualcomm-AI-research / pruning-vs-quantization
View on GitHub
☆26Mar 1, 2024Updated 2 years ago
Cornell-RelaxML / QuIP
View on GitHub
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆399Feb 24, 2024Updated 2 years ago