paramhanji / CUDA-CNN
Implementation of a simple CNN using CUDA
☆67Updated 7 years ago
Alternatives and similar repositories for CUDA-CNN:
Users that are interested in CUDA-CNN are comparing it to the libraries listed below
- Fast CUDA Kernels for ResNet Inference.☆173Updated 5 years ago
- CUDA for MNIST training/inference☆40Updated last year
- CUDA Matrix Multiplication Optimization☆177Updated 8 months ago
- ☆109Updated 11 months ago
- CUDA 6大并行计算模式 代码与笔记☆60Updated 4 years ago
- implementation of winograd minimal convolution algorithm on Intel Architecture☆39Updated 7 years ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- cuDNN sample codes provided by Nvidia☆45Updated 6 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆331Updated 2 months ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆69Updated 5 years ago
- CUDA PTX-ISA Document 中文翻译版☆37Updated 2 weeks ago
- Transparent Cudnn / Cublas / Eigen usage for the deep learning training using MNIST dataset.☆17Updated 4 years ago
- ☆431Updated 9 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆58Updated 6 months ago
- ResNet Implementation, Training, and Inference Using LibTorch C++ API☆39Updated 9 months ago
- ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)☆17Updated 5 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆127Updated 4 years ago
- ☆21Updated 4 years ago
- ☆27Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆24Updated 7 years ago
- Dissecting NVIDIA GPU Architecture☆90Updated 2 years ago
- ☆17Updated 4 years ago
- matrix multiplication in CUDA☆122Updated last year
- Convolutional Neural Network with CUDA (MNIST 99.23%)☆189Updated 2 years ago
- ☆95Updated 3 years ago
- Winograd-based convolution implementation in OpenCL☆28Updated 8 years ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆196Updated 2 years ago
- Implementation of convolution layer in different flavors☆68Updated 7 years ago
- ☆30Updated 2 years ago