guqiong96 / LvllmLinks

LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features an efficient GPU parallel and NUMA parallel architecture, supporting hybrid inference for MOE large models.
78Updated last week

Alternatives and similar repositories for Lvllm

Users that are interested in Lvllm are comparing it to the libraries listed below

Sorting: