triton-inference-server / tensorrt_backendLinks

The Triton backend for TensorRT.

☆79

Alternatives and similar repositories for tensorrt_backend

Users that are interested in tensorrt_backend are comparing it to the libraries listed below

Sorting:

triton-inference-server / dali_backend
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
☆139Updated 3 weeks ago
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆168Updated this week
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆213Updated 7 months ago
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆360Updated 3 weeks ago
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆499Updated last week
triton-inference-server / common
Common source, scripts and utilities shared across all Triton repositories.
☆77Updated last week
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆166Updated last week
triton-inference-server / vllm_backend
☆317Updated last week
triton-inference-server / perf_analyzer
☆123Updated 3 weeks ago
YH-Wu / Triton-Inference-Server-on-Kubernetes
☆33Updated 3 years ago
triton-inference-server / client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
☆665Updated last week
triton-inference-server / paddlepaddle_backend
☆36Updated last year
triton-inference-server / python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
☆660Updated this week
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆287Updated 3 months ago
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆162Updated last month
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆72Updated 3 weeks ago
inisis / OnnxSlim
A Toolkit to Help Optimize Onnx Model
☆256Updated this week
onnx / neural-compressor
Model compression for ONNX
☆99Updated last year
torchpipe / torchpipe
Serving Inside Pytorch
☆165Updated 2 weeks ago
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated last week
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
triton-inference-server / openvino_backend
OpenVINO backend for Triton.
☆34Updated 3 weeks ago
wangkuiyi / huggingface-tokenizer-in-cxx
☆70Updated 2 years ago
triton-inference-server / developer_tools
☆21Updated 3 weeks ago
triton-inference-server / core
The core library and APIs implementing the Triton Inference Server.
☆156Updated this week
neuralmagic / AutoFP8
☆205Updated 6 months ago
meta-pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆182Updated 3 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆829Updated 3 months ago
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆106Updated last year
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago