OpenAI compatible API for TensorRT LLM triton backend
☆220Aug 1, 2024Updated last year
Alternatives and similar repositories for openai_trtllm
Users that are interested in openai_trtllm are comparing it to the libraries listed below
Sorting:
- The Triton TensorRT-LLM Backend☆926Updated this week
- High-level API for tar-based dataset☆12Feb 3, 2024Updated 2 years ago
- ☆332Feb 9, 2026Updated last month
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆27Feb 26, 2024Updated 2 years ago
- AI Router☆14Aug 1, 2024Updated last year
- The driver for LMCache core to run in vLLM☆62Feb 4, 2025Updated last year
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,993Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- JAX bindings for the flash-attention3 kernels☆22Jan 2, 2026Updated 2 months ago
- A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresse…