hyln9 / GCNGEMMLinks

Optimized half precision gemm assembly kernels (deprecated due to ROCm)

☆47

Alternatives and similar repositories for GCNGEMM

Users that are interested in GCNGEMM are comparing it to the libraries listed below

Sorting:

strin / gemm-android
tutorial to optimize GEMM performance on android
☆51Updated 9 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
ColfaxResearch / FALCON
Library for fast image convolution in neural networks on Intel Architecture
☆31Updated 8 years ago
naibaf7 / libdnn
Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
☆136Updated 8 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
masahi / tvm-winograd
Test winograd convolution written in TVM for CUDA and AMDGPU
☆41Updated 6 years ago
XiuYuLi / deepcore_source_code
Subpart source code of of deepcore v0.7
☆27Updated 5 years ago
ravi-teja-mullapudi / Halide-NN
CNNs in Halide
☆23Updated 9 years ago
dmlc / HalideIR
Symbolic Expression and Statement Module for new DSLs
☆205Updated 4 years ago
IntelLabs / SkimCaffe
Caffe for Sparse Convolutional Neural Network
☆238Updated 2 years ago
hma02 / cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆34Updated 5 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 6 years ago
CNugteren / myGEMM
Code appendix to an OpenCL matrix-multiplication tutorial
☆173Updated 8 years ago
Orion34-lanbo / tvm-batch-matmul-example
☆24Updated 7 years ago
linnanwang / BLASX
a heterogeneous multiGPU level-3 BLAS library
☆45Updated 5 years ago
csehydrogen / Winograd-OpenCL
Winograd-based convolution implementation in OpenCL
☆28Updated 8 years ago
CNugteren / CLTune
CLTune: An automatic OpenCL & CUDA kernel tuner
☆180Updated 2 years ago
CSshengxy / MEC
ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)
☆17Updated 6 years ago
tobegit3hub / tftvm
TensorFlow and TVM integration
☆37Updated 5 years ago
xianyi / clOpenBLAS
BLAS OpenCL implementation.
☆16Updated 10 years ago
ppwwyyxx / haDNN
Proof-of-Concept CNN in Halide
☆22Updated 8 years ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆70Updated 6 years ago
zhxfl / CUDA-CNN
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
☆186Updated 7 years ago
zhaoweicai / hwgq
Caffe implementation of accurate low-precision neural networks
☆117Updated 6 years ago
OAID / MXNet-HRT
Heterogeneous Run Time version of MXNet. Added heterogeneous capabilities to the MXNet, uses heterogeneous computing infrastructure frame…
☆72Updated 7 years ago
vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆111Updated 5 years ago
XiaoMi / nnlib
Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
☆58Updated 2 years ago
henline / streamexecutordoc
Documentation for StreamExecutor open source proposal
☆83Updated 9 years ago
tlc-pack / tophub
tophub autotvm log collections
☆69Updated 2 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆177Updated 6 years ago