ztxz16 / fastllm

fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。
3,495Updated this week

Alternatives and similar repositories for fastllm:

Users that are interested in fastllm are comparing it to the libraries listed below