IBM / onnx-mlir-serving

ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput. ONNX Servring provides dynamic batch aggregation and workers pool to fully utilize AI accelerators on the machine.
22Updated last year

Alternatives and similar repositories for onnx-mlir-serving:

Users that are interested in onnx-mlir-serving are comparing it to the libraries listed below