IBM / onnx-mlir-servingLinks
ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput. ONNX Servring provides dynamic batch aggregation and workers pool to fully utilize AI accelerators on the machine.
☆25Updated 4 months ago
Alternatives and similar repositories for onnx-mlir-serving
Users that are interested in onnx-mlir-serving are comparing it to the libraries listed below
Sorting:
- Play with MLIR right in your browser☆139Updated 2 years ago
- Notes and artifacts from the ONNX steering committee☆28Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆48Updated 5 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆35Updated 3 years ago
- A lightweight, Pythonic, frontend for MLIR☆81Updated 2 years ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆104Updated last month
- ☆68Updated 2 years ago