xmartlabs / cuda-calculatorLinks

Online CUDA Occupancy Calculator

☆79

Alternatives and similar repositories for cuda-calculator

Users that are interested in cuda-calculator are comparing it to the libraries listed below

Sorting:

NVlabs / NVBit
☆270Updated 2 months ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
gunrock / loops
🎃 GPU load-balancing library for regular and irregular computations.
☆62Updated last year
cwpearson / nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
☆131Updated 5 years ago
c3sr / tcu_scope
☆51Updated 6 years ago
uuudown / Tartan
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
☆65Updated 6 years ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
NVIDIA / compute-sanitizer-samples
Samples demonstrating how to use the Compute Sanitizer Tools and Public API
☆85Updated last year
GPUPeople / spECK
Efficient SpGEMM on GPU using CUDA and CSR
☆57Updated 2 years ago
HAWAIILAB / cuda-flux
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
☆32Updated 4 years ago
ROCm / rocSHMEM
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
☆98Updated this week
poojahira / spmv-cuda
Implementation and analysis of five different GPU based SPMV algorithms in CUDA
☆41Updated 6 years ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆103Updated 3 years ago
PAA-NCIC / PPoPP2017_artifact
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆81Updated 5 years ago
Jokeren / GPA
GPU Performance Advisor
☆65Updated 3 years ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
sunlex0717 / DissectingTensorCores
☆106Updated last year
RRZE-HPC / gpu-benches
collection of benchmarks to measure basic GPU capabilities
☆401Updated 5 months ago
mark-poscablo / gpu-sum-reduction
CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆37Updated 8 years ago
ap-hynninen / cutt
CUDA Tensor Transpose (cuTT) library
☆52Updated 7 years ago
chai-benchmarks / chai
Chai
☆45Updated last year
hpcgarage / spatter
Benchmark for measuring the performance of sparse and irregular memory access.
☆78Updated 3 months ago
NVIDIA / nsight-training
Training material for Nsight developer tools
☆163Updated last year
intel / xetla
☆62Updated 7 months ago
intel / cutlass-sycl
A CUTLASS implementation using SYCL
☆32Updated this week
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆134Updated last year
dumerrill / merge-spmv
☆94Updated 8 years ago
spcl / open-earth-compiler
development repository for the open earth compiler
☆80Updated 4 years ago
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆40Updated last year