triton-inference-server / openvino_backend
OpenVINO backend for Triton.
☆30Updated last week
Alternatives and similar repositories for openvino_backend:
Users that are interested in openvino_backend are comparing it to the libraries listed below
- The Triton backend for the ONNX Runtime.☆136Updated this week
- The Triton backend for the PyTorch TorchScript models.☆141Updated last week
- The Triton backend for TensorRT.☆68Updated 2 weeks ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆193Updated 2 weeks ago
- Common source, scripts and utilities for creating Triton backends.☆306Updated 2 weeks ago
- ☆37Updated this week
- The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.☆132Updated 2 weeks ago
- ☆32Updated 11 months ago
- Common source, scripts and utilities shared across all Triton repositories.☆67Updated 2 weeks ago
- Nsight Compute in Docker☆11Updated last year
- Model compression for ONNX☆81Updated 2 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆52Updated this week
- ☆54Updated last year
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆448Updated 2 weeks ago
- The Triton backend for TensorFlow.☆47Updated last week
- ☆30Updated 2 years ago
- ☆218Updated this week
- 使用 CUDA C++ 实现的 llama 模型推理框架☆44Updated 2 months ago
- MLPerf™ logging library☆32Updated 3 weeks ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- ☆69Updated last year
- Common utilities for ONNX converters☆257Updated last month
- The core library and APIs implementing the Triton Inference Server.☆114Updated this week
- ☆18Updated last week
- Notes and artifacts from the ONNX steering committee☆25Updated last week
- oneCCL Bindings for Pytorch*☆87Updated 3 weeks ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆99Updated 4 months ago
- Large Language Model Text Generation Inference on Habana Gaudi☆31Updated this week
- ☆58Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆50Updated this week