IBM / onnx-mlir-servingView on GitHub
ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput. ONNX Servring provides dynamic batch aggregation and workers pool to fully utilize AI accelerators on the machine.
26Sep 17, 2025Updated 5 months ago

Alternatives and similar repositories for onnx-mlir-serving

Users that are interested in onnx-mlir-serving are comparing it to the libraries listed below

Sorting:

Are these results useful?