hma02 / cublasHgemm-P100Links

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

☆35

Alternatives and similar repositories for cublasHgemm-P100

Users that are interested in cublasHgemm-P100 are comparing it to the libraries listed below

Sorting:

vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆111Updated 5 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 8 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 7 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆180Updated 6 years ago
zhxfl / CUDA-CNN
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
☆185Updated 8 years ago
IntelLabs / SkimCaffe
Caffe for Sparse Convolutional Neural Network
☆237Updated 2 years ago
yuxianzhi / Top-K
A way to use cuda to accelerate top k algorithm
☆30Updated 8 years ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆72Updated 6 years ago
CSshengxy / MEC
ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)
☆17Updated 6 years ago
tpoisonooo / chgemm
symmetric int8 gemm
☆67Updated 5 years ago
AI-performance / embedded-ai.bench
benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.
☆204Updated 4 years ago
xingyul / sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆193Updated 6 years ago
XiuYuLi / deepcore_source_code
Subpart source code of of deepcore v0.7
☆27Updated 5 years ago
FrozenGene / tvm-tutorial
TVM tutorial
☆66Updated 6 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
intel / optimized-models
☆26Updated 2 years ago
mlcommons / inference_results_v0.5
This repository contains the results and code for the MLPerf™ Inference v0.5 benchmark.
☆55Updated 3 months ago
tlc-pack / tophub
tophub autotvm log collections
☆69Updated 2 years ago
tvmai / meetup-slides
Place for meetup slides
☆140Updated 5 years ago
whitelok / tvm-lesson
动手学习TVM核心原理教程
☆63Updated 4 years ago
csehydrogen / Winograd-OpenCL
Winograd-based convolution implementation in OpenCL
☆28Updated 8 years ago
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆188Updated 6 years ago
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆256Updated last year
andravin / wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆623Updated 5 years ago
lyuchuny3 / Tengine_gemm_tutorial
Tengine gemm tutorial, step by step
☆13Updated 4 years ago
tobegit3hub / tftvm
TensorFlow and TVM integration
☆36Updated 5 years ago
pytorch / tvm
TVM integration into PyTorch
☆454Updated 5 years ago
strin / gemm-android
tutorial to optimize GEMM performance on android
☆51Updated 9 years ago
CAS-CLab / CNN-Inference-Engine-Quick-View
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
☆151Updated 3 years ago
XiaoMi / nnlib
Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
☆58Updated 2 years ago