ccs96307 / fast-llm-inferenceView on GitHub
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
11Jul 1, 2025Updated 7 months ago

Alternatives and similar repositories for fast-llm-inference

Users that are interested in fast-llm-inference are comparing it to the libraries listed below

Sorting:

Are these results useful?