MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆178Updated last year
Alternatives and similar repositories for mperf:
Users that are interested in mperf are comparing it to the libraries listed below
- ☆95Updated 3 years ago
- arm-neon☆90Updated 7 months ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆68Updated 5 years ago
- symmetric int8 gemm☆66Updated 4 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- ☆82Updated last year
- ☆35Updated 5 months ago
- ☆109Updated 11 months ago
- code reading for tvm☆75Updated 3 years ago
- ☆245Updated last year
- Common libraries for PPL projects☆29Updated last week
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆57Updated last year
- row-major matmul optimization☆610Updated last year
- Yinghan's Code Sample☆313Updated 2 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆82Updated 2 years ago
- ☆113Updated last year
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆480Updated 4 months ago
- CUDA PTX-ISA Document 中文翻译版☆37Updated last week
- examples for tvm schedule API☆99Updated last year
- benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.☆203Updated 4 years ago
- arm neon 相关文档和指令意义☆242Updated 5 years ago
- 分层解耦的深度学习推理引擎☆72Updated last month
- This is an implementation of sgemm_kernel on L1d cache.☆225Updated last year
- ☆17Updated 11 months ago
- Tencent NCNN with added CUDA support☆69Updated 4 years ago
- heterogeneity-aware-lowering-and-optimization☆254Updated last year
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆110Updated this week
- pdf☆89Updated 6 years ago
- A simple high performance CUDA GEMM implementation.☆352Updated last year
- ☆145Updated 2 months ago