xnd-project / cuda-benchmarksLinks

Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.

☆20

Alternatives and similar repositories for cuda-benchmarks

Users that are interested in cuda-benchmarks are comparing it to the libraries listed below

Sorting:

NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
cjmcv / hpc
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
☆60Updated 4 months ago
matazure / mtensor
a c++/cuda template library for tensor lazy evaluation
☆162Updated 2 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
Forwil / tvmt_v2
☆10Updated 5 years ago
ysh329 / OpenCL-101
Learn OpenCL step by step.
☆138Updated 2 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆177Updated 6 years ago
mark-poscablo / gpu-sum-reduction
CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆37Updated 8 years ago
OrangeOwlSolutions / General-CUDA-programming
☆44Updated 7 years ago
tlc-pack / tophub
tophub autotvm log collections
☆70Updated 2 years ago
BBuf / how-to-optimize-gemm
☆97Updated 4 years ago
CSshengxy / MEC
ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)
☆17Updated 6 years ago
vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆111Updated 5 years ago
ap-hynninen / cutt
CUDA Tensor Transpose (cuTT) library
☆52Updated 7 years ago
cwpearson / nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
☆131Updated 5 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆255Updated last year
hma02 / cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆34Updated 5 years ago
vancemiller / CUDA-preemption
Experiments evaluating preemption on the NVIDIA Pascal architecture
☆17Updated 8 years ago
FrozenGene / tvm-tutorial
TVM tutorial
☆66Updated 6 years ago
passlab / CUDAMicroBench
☆42Updated last month
weifengliu-ssslab / Benchmark_SpGEMM_using_CSR
CSR-based SpGEMM on nVidia and AMD GPUs
☆46Updated 9 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 6 years ago
ndd314 / cuda_examples
☆68Updated 11 years ago
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆183Updated 6 years ago
yester31 / Cutlass_EX
study of cutlass
☆22Updated 8 months ago
yuxianzhi / Top-K
A way to use cuda to accelerate top k algorithm
☆29Updated 8 years ago
fernandoc1 / Benchmarking-CUDA
A quick way to benchmark your CUDA compiler on a Linux environment
☆26Updated 14 years ago
ekondis / mixbench
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
☆412Updated 6 months ago