EmbeddedLLM / vllmLinks
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆90Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- ☆120Updated last year
- 1.58-bit LLaMa model☆83Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs