ztxz16 / fastllmLinks

fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。
3,574Updated last week

Alternatives and similar repositories for fastllm

Users that are interested in fastllm are comparing it to the libraries listed below

Sorting: