HanGuo97 / fluteLinks
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
☆374Updated 4 months ago
Alternatives and similar repositories for flute
Users that are interested in flute are comparing it to the libraries listed below
Sorting:
- A safetensors extension to efficiently store sparse quantized tensors on disk☆153Updated this week
- Fast low-bit matmul kernels in Triton☆356Updated last week
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆418Updated 9 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆375Updated last year