MegEngine / MegPeakLinks

☆254

Alternatives and similar repositories for MegPeak

Users that are interested in MegPeak are comparing it to the libraries listed below

Sorting:

MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆191Updated 2 years ago
MegEngine / MegCC
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
☆491Updated last year
BBuf / how-to-optimize-gemm
☆98Updated 4 years ago
OAID / AutoKernel
AutoKernel 是一个简单易用，低门槛的自动算子优化工具，提高深度学习算法部署效率。
☆743Updated 3 years ago
XiaoMi / nnlib
Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
☆58Updated 2 years ago
Cambricon / mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
☆141Updated this week
pigirons / cpufp
A CPU tool for benchmarking the peak of floating points
☆568Updated 4 months ago
pigirons / sgemm_hsw
This is an implementation of sgemm_kernel on L1d cache.
☆231Updated last year
tpoisonooo / chgemm
symmetric int8 gemm
☆67Updated 5 years ago
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆84Updated 2 years ago
PaddlePaddle / CINN
Compiler Infrastructure for Neural Networks
☆147Updated 2 years ago
flagos-ai / flagtree
FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.
☆137Updated this week
BBuf / ArmNeonOptimization
arm-neon
☆92Updated last year
njuhope / cuda_sgemm
☆116Updated last year
pigirons / conv3x3_m1
This is a demo how to write a high performance convolution run on apple silicon
☆57Updated 3 years ago
QianyanTech / NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆91Updated 2 years ago
flagos-ai / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆354Updated 3 weeks ago
bytedance / byteir
A model compilation solution for various hardware
☆456Updated 3 months ago
Archermmt / tvm_walk_through
code reading for tvm
☆76Updated 3 years ago
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆256Updated last year
OpenPPL / ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
☆512Updated last year
AI-performance / embedded-ai.bench
benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.
☆204Updated 4 years ago
OpenPPL / ppl.llm.kernel.cuda
☆152Updated 10 months ago
MARD1NO / CUDA-PPT
☆113Updated 8 months ago
nicolaswilde / cuda-tensorcore-hgemm
☆156Updated 11 months ago
StrongSpoon / tvm.schedule
examples for tvm schedule API
☆102Updated 2 years ago
OpenPPL / ppl.common
Common libraries for PPL projects
☆30Updated 8 months ago
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆276Updated 3 months ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆73Updated 6 years ago
OpenPPL / ppl.kernel.cpu
☆19Updated last year