zhxfl / CUDA-CNNLinks

CNN accelerated by cuda. Test on mnist and finilly get 99.76%

☆186

Alternatives and similar repositories for CUDA-CNN

Users that are interested in CUDA-CNN are comparing it to the libraries listed below

Sorting:

IntelLabs / SkimCaffe
Caffe for Sparse Convolutional Neural Network
☆238Updated 2 years ago
tbennun / cudnn-training
A CUDNN minimal deep learning training code sample using LeNet.
☆267Updated last year
NVIDIA / cnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆297Updated 6 years ago
hma02 / cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆34Updated 5 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 8 years ago
Caffe-MPI / Caffe-MPI.github.io
☆125Updated 7 years ago
dmlc / MXNet.cpp
C++ interface for mxnet
☆115Updated 8 years ago
dnouri / cuda-convnet
My fork of Alex Krizhevsky's cuda-convnet from 2013 where I added dropout, among other features.
☆260Updated 10 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 6 years ago
chaolongzhang / algorithms-cuda
parallel algorithm based on cuda
☆60Updated 7 years ago
CAS-CLab / quantized-cnn
An efficient framework for convolutional neural networks
☆277Updated last year
pmgysel / caffe
Ristretto: Caffe-based approximation of convolutional neural networks.
☆291Updated 6 years ago
strin / gemm-android
tutorial to optimize GEMM performance on android
☆51Updated 9 years ago
andravin / wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆619Updated 4 years ago
serban / kmeans
A CUDA implementation of the k-means clustering algorithm
☆252Updated 13 years ago
yuxianzhi / Top-K
A way to use cuda to accelerate top k algorithm
☆29Updated 8 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
XiuYuLi / deepcore_source_code
Subpart source code of of deepcore v0.7
☆27Updated 5 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
neopenx / Dragon
Dragon: A Computation Graph Virtual Machine Based Deep Learning Framework.
☆175Updated 7 years ago
openai / openai-gemm
Open single and half precision gemm implementations
☆381Updated 2 years ago
ArchaeaSoftware / cudahandbook
Source code that accompanies The CUDA Handbook.
☆527Updated 5 months ago
eBay / maxDNN
High Efficiency Convolution Kernel for Maxwell GPU Architecture
☆134Updated 8 years ago
songhan / SqueezeNet-Deep-Compression
☆402Updated 6 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆177Updated 6 years ago
xingyul / sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆191Updated 6 years ago
hpi-xnor / BMXNet
(New version is out: https://github.com/hpi-xnor/BMXNet-v2) BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet
☆349Updated 5 years ago
OAID / MXNet-HRT
Heterogeneous Run Time version of MXNet. Added heterogeneous capabilities to the MXNet, uses heterogeneous computing infrastructure frame…
☆72Updated 7 years ago
naibaf7 / libdnn
Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
☆136Updated 8 years ago
MatthieuCourbariaux / deep-learning-multipliers
Training deep neural networks with low precision multiplications
☆63Updated 10 years ago