IBM / onnx-mlir-serving

ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput. ONNX Servring provides dynamic batch aggregation and workers pool to fully utilize AI accelerators on the machine.
23Updated this week

Alternatives and similar repositories for onnx-mlir-serving

Users that are interested in onnx-mlir-serving are comparing it to the libraries listed below

Sorting: