Hardware-Alchemy / cuDNN-sampleLinks

cuDNN sample codes provided by Nvidia

☆46

Alternatives and similar repositories for cuDNN-sample

Users that are interested in cuDNN-sample are comparing it to the libraries listed below

Sorting:

anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆183Updated 6 years ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆213Updated last year
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
yalue / cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
☆86Updated 11 months ago
NVIDIA / nsight-training
Training material for Nsight developer tools
☆163Updated 11 months ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆103Updated 3 years ago
sunlex0717 / DissectingTensorCores
☆106Updated last year
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
mit-han-lab / inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆200Updated 3 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆177Updated 6 years ago
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆95Updated 7 years ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated 2 years ago
cwpearson / nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
☆131Updated 5 years ago
haanjack / mnist-cudnn
CUDA for MNIST training/inference
☆42Updated last year
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆63Updated 10 months ago
ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆443Updated this week
tlc-pack / tophub
tophub autotvm log collections
☆70Updated 2 years ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
njuhope / cuda_sgemm
☆113Updated last year
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆134Updated last year
masahi / tvm-cutlass-eval
☆40Updated 3 years ago
ColfaxResearch / cfx-article-src
☆127Updated 2 months ago
passlab / CUDAMicroBench
☆42Updated last month
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆105Updated 10 months ago
pku-liang / FlexTensor
Automatic Schedule Exploration and Optimization Framework for Tensor Computations
☆177Updated 3 years ago
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
UDC-GAC / openCNN
A Winograd Minimal Filter Implementation in CUDA
☆25Updated 3 years ago
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆392Updated last year
lixiuhong / batched_gemm
☆39Updated 5 years ago