A high-throughput and memory-efficient inference and serving engine for LLMs
☆25Mar 5, 2026Updated 2 months ago
Alternatives and similar repositories for upstreaming-to-vllm
Users that are interested in upstreaming-to-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17May 18, 2026Updated last week
- ☆34May 14, 2026Updated last week
- Project showing how to develop NKI kernels for Llama 3.2 1B inference