neuralmagic / AutoFP8
☆145Updated last month
Related projects: ⓘ
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.☆141Updated 3 weeks ago
- Easy and Efficient Quantization for Transformers☆172Updated 2 months ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆258Updated 2 months ago
- KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆282Updated last month
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆173Updated 3 months ago
- ☆108Updated 6 months ago
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving☆399Updated 2 weeks ago
- Applied AI experiments and examples for PyTorch☆123Updated last month
- An easy-to-use LLM quantization and inference toolkit based on GPTQ algorithm (weight-only quantization).☆90Updated this week