KernelTuner / kernel_tuner_tutorial
A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/
☆29Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for kernel_tuner_tutorial
- NPBench - A Benchmarking Suite for High-Performance NumPy☆73Updated this week
- Repository with examples and exercises for OLCF and AMD's HIP training series☆14Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆44Updated last month
- Training examples for SYCL☆38Updated last week
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆14Updated this week
- A C++based implementation of the TeaLeaf heat conduction mini-app. This implementation of TeaLeaf replicates the functionality of the ref…☆22Updated 3 months ago
- OpenMP Tutorial☆9Updated 5 months ago
- Data and reproducibility scripts for the UoB-HPC Performance Portability studies☆14Updated 5 months ago
- ALCF Computational Performance Workshop☆34Updated 2 years ago
- The Foundation for All Legate Libraries☆193Updated this week
- Benchmark implementation of CosmoFlow in TensorFlow Keras☆20Updated 9 months ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated 3 months ago
- Intermediate MPI lesson☆26Updated last year
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆93Updated 3 weeks ago
- Reference implementations of MLPerf™ HPC training benchmarks☆42Updated 5 months ago
- ☆36Updated last week
- General Purpose Timing Library☆32Updated 6 months ago
- E4S for Spack☆30Updated this week
- Benchmarks☆15Updated last month
- N-Ways to Multi-GPU Programming☆15Updated last year
- Graph-indexed Pandas DataFrames for analyzing hierarchical performance data☆30Updated 3 weeks ago
- ☆10Updated 3 months ago
- Very-Low Overhead Checkpointing System☆54Updated 3 weeks ago
- ☆68Updated last week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆188Updated this week
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆156Updated 2 weeks ago
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- OpenACC* to OpenMP* API assisting migration tool☆32Updated 3 weeks ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated 2 months ago
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆139Updated this week