triton-inference-server / onnxruntime_backendLinks

The Triton backend for the ONNX Runtime.

☆159

Alternatives and similar repositories for onnxruntime_backend

Users that are interested in onnxruntime_backend are comparing it to the libraries listed below

Sorting:

triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆158Updated 2 weeks ago
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆338Updated 2 weeks ago
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆484Updated 2 weeks ago
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆210Updated 4 months ago
triton-inference-server / tensorrt_backend
The Triton backend for TensorRT.
☆78Updated 2 weeks ago
triton-inference-server / common
Common source, scripts and utilities shared across all Triton repositories.
☆76Updated 2 weeks ago
triton-inference-server / vllm_backend
☆289Updated 2 weeks ago
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆276Updated last month
triton-inference-server / client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
☆641Updated this week
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆66Updated 2 weeks ago
triton-inference-server / core
The core library and APIs implementing the Triton Inference Server.
☆146Updated 2 weeks ago
triton-inference-server / perf_analyzer
☆103Updated last week
triton-inference-server / python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
☆631Updated last week
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆370Updated this week
microsoft / onnxruntime-extensions
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
☆407Updated last week
triton-inference-server / fastertransformer_backend
☆412Updated last year
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆102Updated last year
pytorch / ort
Accelerate PyTorch models with ONNX Runtime
☆363Updated 5 months ago
pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated last month
triton-inference-server / openvino_backend
OpenVINO backend for Triton.
☆34Updated 2 weeks ago
anyscale / llm-continuous-batching-benchmarks
☆120Updated last year
microsoft / onnxruntime-training-examples
Examples for using ONNX Runtime for model training.
☆339Updated 10 months ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆201Updated last month
pytorch / torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆382Updated this week
neuralmagic / AutoFP8
☆195Updated 3 months ago
triton-inference-server / dali_backend
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
☆136Updated 2 weeks ago
triton-inference-server / tensorflow_backend
The Triton backend for TensorFlow.
☆54Updated 2 months ago
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆482Updated this week
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆816Updated last week
mlc-ai / tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
☆374Updated 2 weeks ago