Qualcomm-AI-research / gptvq
View external linksLinks

☆40

Alternatives and similar repositories for gptvq

Users that are interested in gptvq are comparing it to the libraries listed below

Sorting:

cat538 / SKVQ
View on GitHub
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆25Oct 5, 2024Updated last year
RUCAIBox / QuantizedEmpirical
View on GitHub
☆15Sep 24, 2023Updated 2 years ago
Qualcomm-AI-research / lr-qat
View on GitHub
☆52Nov 5, 2024Updated last year
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆51Aug 9, 2024Updated last year
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
ChengZhang-98 / LQER
View on GitHub
Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"
☆19Jul 11, 2024Updated last year
stellaraccident / mlir-py-release
View on GitHub
☆12Jul 9, 2021Updated 4 years ago
iankur / vqllm
View on GitHub
Residual vector quantization for KV cache compression in large language model
☆11Oct 22, 2024Updated last year
ByteDance-Seed / decoupleQ
View on GitHub
A quantization algorithm for LLM
☆148Jun 21, 2024Updated last year
snu-mllab / GuidedQuant
View on GitHub
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
☆50Jul 6, 2025Updated 7 months ago
NaelF / BinaryCoP
View on GitHub
Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices
☆12Jul 1, 2021Updated 4 years ago
Dao-AILab / fast-hadamard-transform
View on GitHub
Fast Hadamard transform in CUDA, with a PyTorch interface
☆284Oct 19, 2025Updated 3 months ago
IST-DASLab / OBC
View on GitHub
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆129Jul 11, 2023Updated 2 years ago
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆69Mar 7, 2024Updated last year
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Nov 23, 2024Updated last year
uwsampa / mcpat
View on GitHub
McPAT modeling framework
☆12Oct 18, 2014Updated 11 years ago
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆37Sep 24, 2024Updated last year
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆372Feb 14, 2025Updated last year
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆88Apr 8, 2025Updated 10 months ago
nasosger / MuToR
View on GitHub
[NeurIPS '25] Multi-Token Prediction Needs Registers
☆26Dec 14, 2025Updated 2 months ago
ConCopilot / concopilot
View on GitHub
Making AI & LLM APPs components reusable, replaceable, portable, and flexible.
☆24Apr 28, 2024Updated last year
htqin / IR-QLoRA
View on GitHub
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Apr 15, 2024Updated last year
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆172Nov 26, 2025Updated 2 months ago
BaiTheBest / SparseLLM
View on GitHub
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆67Mar 27, 2025Updated 10 months ago
IST-DASLab / SparseFinetuning
View on GitHub
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Jan 15, 2024Updated 2 years ago
NJUNLP / MCSD
View on GitHub
Multi-Candidate Speculative Decoding
☆39Apr 22, 2024Updated last year
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆482Nov 26, 2024Updated last year
PingchengDong / GQA-LUT
View on GitHub
The official implementation of the DAC 2024 paper GQA-LUT
☆20Dec 20, 2024Updated last year
ducdauge / sft-llm
View on GitHub
Scaling Sparse Fine-Tuning to Large Language Models
☆18Jan 31, 2024Updated 2 years ago
ttambe / AdaptivFloat
View on GitHub
Adaptive floating-point based numerical format for resilient deep learning
☆14Apr 11, 2022Updated 3 years ago
ucb-bar / cva6-wrapper
View on GitHub
Wrapper for ETH Ariane Core
☆22Sep 2, 2025Updated 5 months ago
fkiaee / sparsecnn
View on GitHub
Implementation of ADMM-based sparse CNN architecture.
☆12Aug 30, 2017Updated 8 years ago
Qualcomm-AI-research / FP8-quantization
View on GitHub
☆169Mar 9, 2023Updated 2 years ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆211Nov 25, 2025Updated 2 months ago
Qualcomm-AI-research / oscillations-qat
View on GitHub
☆79Jul 21, 2022Updated 3 years ago
astra-sim / astra-network-analytical
View on GitHub
☆20Nov 12, 2025Updated 3 months ago
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆180Oct 3, 2024Updated last year
OpenSparseLLMs / Linearization
View on GitHub
☆66Jul 8, 2025Updated 7 months ago
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆123Jul 4, 2025Updated 7 months ago

Qualcomm-AI-research / gptvqView external linksLinks

Alternatives and similar repositories for gptvq

Qualcomm-AI-research / gptvq
View external linksLinks