skyde1021 / CUDA_CNNLinks

Various version (CPU, CUDA_NAIVE, CUDA_TILED, GEMM) convolutional neural network implementations by Heechul Lim

☆30

Alternatives and similar repositories for CUDA_CNN

Users that are interested in CUDA_CNN are comparing it to the libraries listed below

Sorting:

yuxianzhi / Top-K
A way to use cuda to accelerate top k algorithm
☆30Updated 8 years ago
romulus0914 / CNN_VGG19_CUDA
Convolutional Neural Network of vgg19 model using Cuda to accelerate
☆12Updated 7 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆182Updated 6 years ago
pytorch / tvm
TVM integration into PyTorch
☆456Updated 5 years ago
whitelok / tvm-lesson
动手学习TVM核心原理教程
☆63Updated 5 years ago
keithyin / read-pytorch-source-code
pytorch源码阅读 0.2.0 版本
☆91Updated 6 years ago
HadXu / Thunder
A small deep-learning framework with C++/Python/CUDA
☆54Updated 7 years ago
tengkz / tensorflow_notes
tensorflow源码阅读笔记
☆193Updated 7 years ago
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆645Updated 3 years ago
tbennun / cudnn-training
A CUDNN minimal deep learning training code sample using LeNet.
☆268Updated 2 years ago
zhxfl / CUDA-CNN
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
☆187Updated 8 years ago
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆257Updated last year
matazure / mtensor
a c++/cuda template library for tensor lazy evaluation
☆164Updated 2 years ago
NVIDIA / kmeans
kmeans clustering with multi-GPU capabilities
☆121Updated 2 years ago
BBuf / Memory-efficient-Convolution-for-Deep-Neural-Network
☆22Updated 5 years ago
vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆112Updated 5 years ago
ConstantPark / DL_Compiler
Study Group of Deep Learning Compiler
☆166Updated 2 years ago
sallenkey-wei / cuda-handbook
pdf
☆92Updated 7 years ago
hma02 / cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆35Updated 6 years ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆73Updated 6 years ago
csehydrogen / Winograd-OpenCL
Winograd-based convolution implementation in OpenCL
☆28Updated 8 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 7 years ago
NVIDIA / tensorrt-laboratory
Explore the Capabilities of the TensorRT Platform
☆264Updated 4 years ago
snuspl / parallax
A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.
☆132Updated 3 years ago
FrozenGene / tvm-tutorial
TVM tutorial
☆66Updated 6 years ago
deeperlearning / professional-cuda-c-programming
☆481Updated 10 years ago
flame / blislab
BLISlab: A Sandbox for Optimizing GEMM
☆553Updated 4 years ago
CAS-CLab / CNN-Inference-Engine-Quick-View
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
☆151Updated 3 years ago
xingyul / sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆193Updated 6 years ago
snuspl / nimble
Lightweight and Parallel Deep Learning Framework
☆263Updated 3 years ago