microsoft / onnxruntime-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆34Updated 2 years ago
Alternatives and similar repositories for onnxruntime-tvm:
Users that are interested in onnxruntime-tvm are comparing it to the libraries listed below
- ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implement…☆22Updated last year
- ☆69Updated last year
- ☆12Updated 5 years ago
- TensorFlow and TVM integration☆37Updated 4 years ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- AMD's graph optimization engine.☆207Updated this week
- Self-trained Large Language Models based on Meta LLaMa☆30Updated last year
- Common source, scripts and utilities shared across all Triton repositories.☆68Updated this week
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 3 years ago
- ☆11Updated 3 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆67Updated 5 years ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- Fast sparse deep learning on CPUs☆52Updated 2 years ago
- ☆157Updated this week
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆59Updated last week
- Parse TFLite models (*.tflite) EASILY with Python. Check the API at https://zhenhuaw.me/tflite/docs/☆98Updated 2 weeks ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- Whisper in TensorRT-LLM☆15Updated last year
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆23Updated this week
- Common utilities for ONNX converters☆257Updated 2 months ago
- A tracing JIT for PyTorch☆17Updated 2 years ago
- ☆23Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago
- An optimizing compiler for decision tree ensemble inference.☆17Updated this week
- Snapdragon Neural Processing Engine (SNPE) SDKThe Snapdragon Neural Processing Engine (SNPE) is a Qualcomm Snapdragon software accelerate…☆33Updated 2 years ago
- Inference of quantization aware trained networks using TensorRT☆80Updated 2 years ago
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- benchmarking some transformer deployments☆26Updated last year