xuyuzhuang11/OneBit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xuyuzhuang11/OneBit)

xuyuzhuang11 / OneBit

The homepage of OneBit model quantization framework.

☆206

Alternatives and similar repositories for OneBit

Users that are interested in OneBit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
Agora-Lab-AI / BitNet-a4.8
View on GitHub
BitNet a4.8 Implementation in one file of pytorch
☆24Jan 13, 2025Updated last year
Aaronhuang-778 / BiLLM
View on GitHub
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆235Jan 11, 2025Updated last year
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
IST-DASLab / qmoe
View on GitHub
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Nov 3, 2023Updated 2 years ago
AI-Efficiency / IR-QLoRA
View on GitHub
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆65Apr 15, 2024Updated 2 years ago
Qualcomm-AI-research / lr-qat
View on GitHub
☆54Nov 5, 2024Updated last year
nbasyl / LLM-FP4
View on GitHub
The official implementation of the EMNLP 2023 paper LLM-FP4
☆226Dec 15, 2023Updated 2 years ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
microsoft / VPTQ
View on GitHub
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆682Apr 25, 2025Updated last year
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆39Aug 20, 2024Updated last year
hahnyuan / PB-LLM
View on GitHub
PB-LLM: Partially Binarized Large Language Models
☆158Nov 20, 2023Updated 2 years ago
imagination-research / LCSC
View on GitHub
[ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
☆16Feb 15, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
chhzh123 / ptc-tutorial
View on GitHub
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Mar 13, 2023Updated 3 years ago
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆139May 16, 2024Updated 2 years ago
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆903Nov 26, 2025Updated 8 months ago
UNITES-Lab / MoE-Quantization
View on GitHub
Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"
☆31Jun 30, 2025Updated last year
duterscmy / CD-MoE
View on GitHub
Official PyTorch implementation of CD-MOE
☆12Mar 18, 2026Updated 4 months ago
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆158Aug 21, 2025Updated 11 months ago
hahnyuan / RPTQ4LLM
View on GitHub
Reorder-based post-training quantization for large language model
☆199May 17, 2023Updated 3 years ago
IST-DASLab / marlin
View on GitHub
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,111Sep 4, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
imagination-research / EEP
View on GitHub
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆25Nov 11, 2025Updated 8 months ago
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
dropbox / hqq
View on GitHub
Official implementation of Half-Quadratic Quantization (HQQ)
☆948Feb 26, 2026Updated 5 months ago
cat538 / SKVQ
View on GitHub
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆24Oct 5, 2024Updated last year
microsoft / TransformerCompression
View on GitHub
For releasing code related to compression methods for transformers, accompanying our publications
☆462Jan 16, 2025Updated last year
BillAmihom / RAPQ
View on GitHub
Pytorch implementation of RAPQ, IJCAI 2022
☆23Jul 19, 2023Updated 3 years ago
BohanLi0110 / NLP-DA-Papers
View on GitHub
☆26Nov 20, 2021Updated 4 years ago
SJTU-ReArch-Group / M2XFP_ASPLOS26
View on GitHub
[ASPLOS 2026] M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.
☆15Jan 29, 2026Updated 6 months ago
iankur / vqllm
View on GitHub
Residual vector quantization for KV cache compression in large language model
☆12Oct 22, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆187Apr 24, 2026Updated 3 months ago
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆130Jul 4, 2025Updated last year
HanGuo97 / flute
View on GitHub
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
☆391Apr 13, 2025Updated last year
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆327Mar 4, 2025Updated last year