HanGuo97 / fluteLinks
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
☆380Updated 9 months ago
Alternatives and similar repositories for flute
Users that are interested in flute are comparing it to the libraries listed below
Sorting:
- Fast low-bit matmul kernels in Triton☆423Updated 3 weeks ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆397Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆233Updated this week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆255Updated last year