NetEase-Media / grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
125Updated last week

Alternatives and similar repositories for grps_trtllm:

Users that are interested in grps_trtllm are comparing it to the libraries listed below