KernelTuner / kernel_tuner_tutorial
A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/
☆30Updated 2 weeks ago
Alternatives and similar repositories for kernel_tuner_tutorial:
Users that are interested in kernel_tuner_tutorial are comparing it to the libraries listed below
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated last week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆44Updated last week
- Graph-indexed Pandas DataFrames for analyzing hierarchical performance data☆32Updated 5 months ago
- JUPITER Benchmark Suite☆16Updated 8 months ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆80Updated 2 weeks ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆29Updated 9 months ago
- N-Ways to Multi-GPU Programming☆21Updated 2 years ago
- A C++based implementation of the TeaLeaf heat conduction mini-app. This implementation of TeaLeaf replicates the functionality of the ref…☆23Updated 8 months ago
- ☆18Updated 5 years ago
- Analyze parallel execution traces using pandas dataframes☆22Updated this week
- OpenMP Tutorial☆9Updated 3 months ago
- ☆48Updated last week
- An open collaborative repository for reproducible specifications of HPC benchmarks and cross site benchmarking environments☆38Updated this week
- COCCL: Compression and precision co-aware collective communication library☆22Updated last month
- Benchmark implementation of CosmoFlow in TensorFlow Keras☆21Updated last year
- A parallel framework for training deep neural networks☆58Updated last month
- Repository with examples and exercises for OLCF and AMD's HIP training series☆16Updated last year
- ☆14Updated 2 years ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆53Updated last month
- ALCF Computational Performance Workshop☆37Updated 2 years ago
- Highly Efficient FFT for Exascale☆37Updated 11 months ago
- A tracing infrastructure for heterogeneous computing applications.☆31Updated this week
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner☆20Updated 11 months ago
- OpenMP vs Offload☆21Updated last year
- C++ HPC Tutorial materials☆49Updated 9 months ago
- Analyze graph/hierarchical performance data using pandas dataframes☆113Updated 2 months ago
- ☆15Updated 2 weeks ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆76Updated last week
- Training examples for SYCL☆40Updated last week