janhq / cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
42Updated 3 months ago

Alternatives and similar repositories for cortex.tensorrt-llm:

Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below