xuyuzhuang11 / OneBit
The homepage of OneBit model quantization framework.
☆175Updated 2 months ago
Alternatives and similar repositories for OneBit:
Users that are interested in OneBit are comparing it to the libraries listed below
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆263Updated 6 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆214Updated 3 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆340Updated 8 months ago
- ☆197Updated 4 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆424Updated 3 months ago
- PB-LLM: Partially Binarized Large Language Models☆151Updated last year
- ☆126Updated last month
- scalable and robust tree-based speculative decoding algorithm☆343Updated 2 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆68Updated 11 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆181Updated 3 months ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆197Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆288Updated 3 months ago