Cornell-RelaxML/yaqa-quantization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Cornell-RelaxML/yaqa-quantization)

Cornell-RelaxML / yaqa-quantization

☆84

Alternatives and similar repositories for yaqa-quantization

Users that are interested in yaqa-quantization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Cornell-RelaxML / qtip
View on GitHub
☆180Jun 22, 2025Updated last year
Cornell-RelaxML / quip-sharp
View on GitHub
☆600Oct 29, 2024Updated last year
chu-tianxiang / QuIP-for-all
View on GitHub
QuIP quantization
☆66Mar 17, 2024Updated 2 years ago
turboderp-org / exllamav3
View on GitHub
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆1,054Updated this week
Debrup-61 / RaDeR
View on GitHub
Official Code Repositiry for "RaDeR: Reasoning-aware Dense Retrieval Models" accepted at Main Conference EMNLP 2025
☆18Jun 23, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆415Feb 14, 2025Updated last year
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 7 months ago
IST-DASLab / gptq-gguf-toolkit
View on GitHub
Efficient non-uniform quantization with GPTQ for GGUF
☆64Sep 17, 2025Updated 10 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 7 months ago
wejoncy / QLLM
View on GitHub
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆190Mar 23, 2026Updated 3 months ago
krafton-ai / lexico
View on GitHub
KV cache compression via sparse coding
☆17Oct 26, 2025Updated 8 months ago
IST-DASLab / QuEST
View on GitHub
Work in progress.
☆80Nov 25, 2025Updated 7 months ago
VolkanSimsir / Auto-Inference
View on GitHub
☆17Mar 20, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
amazon-science / mxfp4-llm
View on GitHub
Official implementation for Training LLMs with MXFP4
☆130Apr 25, 2025Updated last year
IST-DASLab / MatGPTQ
View on GitHub
Code for MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
☆22Feb 18, 2026Updated 5 months ago
NYCU-EDgeAi / subspec
View on GitHub
[NeurIPS 2025] Speculate Deep and Accurate
☆21Jan 16, 2026Updated 6 months ago
shawnricecake / squant
View on GitHub
[ICCAD 2025] Squant
☆15Jul 3, 2025Updated last year
thad0ctor / KrunchWrapper
View on GitHub
☆18Jul 1, 2025Updated last year
fajrmn / kokoro-on-browser
View on GitHub
☆16Feb 1, 2025Updated last year
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆477Updated this week
microsoft / VPTQ
View on GitHub
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆682Apr 25, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Qualcomm-AI-research / lr-qat
View on GitHub
☆54Nov 5, 2024Updated last year
thu-nics / PM-KVQ
View on GitHub
The official code implementation for paper "PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs"
☆29May 24, 2025Updated last year
z-lab / paroquant
View on GitHub
[ICLR 2026] ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
☆325Jul 1, 2026Updated 2 weeks ago
togethercomputer / saw-int4
View on GitHub
Official implementation of Paper "System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving"
☆30Apr 17, 2026Updated 3 months ago
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
snu-mllab / GuidedQuant
View on GitHub
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
☆53Apr 13, 2026Updated 3 months ago
hyperfocAIs / Attend
View on GitHub
Attend - to what matters.
☆17Feb 22, 2025Updated last year
wdlctc / headinfer
View on GitHub
☆63May 16, 2025Updated last year
dlwns147 / amq
View on GitHub
[EMNLP 2025] AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
☆16Apr 29, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
chenzx921020 / MoEQuant
View on GitHub
☆17Apr 7, 2025Updated last year
Summer-Summer / Kitty
View on GitHub
Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference.
☆17May 20, 2026Updated 2 months ago
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
HazyResearch / ThunderMittens
View on GitHub
☆19Aug 26, 2025Updated 10 months ago
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
NVlabs / QeRL
View on GitHub
[ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.
☆511Mar 30, 2026Updated 3 months ago
matt-c1 / llama-3-quant-comparison
View on GitHub
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆169May 16, 2024Updated 2 years ago