Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner
☆21Sep 12, 2025Updated 5 months ago
Alternatives and similar repositories for kernel_launcher
Users that are interested in kernel_launcher are comparing it to the libraries listed below
Sorting:
- A GPU benchmark suite for autotuners☆19Feb 20, 2024Updated 2 years ago
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- ☆11Jun 9, 2023Updated 2 years ago
- Kernel Tuner☆386Updated this week
- GPUDirect Async implementation of HPGMG-FV CUDA☆11May 11, 2018Updated 7 years ago
- High-Performance Linpack Benchmark adopted version for GPU backend☆12Sep 12, 2022Updated 3 years ago
- ☆11Aug 8, 2021Updated 4 years ago
- ☆17Dec 8, 2023Updated 2 years ago
- GPGPU-SIM 使用篇☆14Nov 12, 2022Updated 3 years ago
- A Monte Carlo Neutron Transport Mini-App☆15Apr 15, 2019Updated 6 years ago
- A benchmarking suite for heterogeneous systems. The primary goal of this project is to improve and update aspects of existing benchmarkin…☆43Jan 30, 2026Updated last month
- An HPL-AI implementation for Fugaku☆23Jun 29, 2021Updated 4 years ago
- Data Accelerator: Creates a burst buffer from generic hardware and integrates it with Slurm https://www.hpc.cam.ac.uk/research/data-acc h…☆18Mar 30, 2023Updated 2 years ago
- C++ HPC Math Library☆46Dec 9, 2019Updated 6 years ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 2 months ago
- Ansible role for OpenHPC☆51Updated this week
- Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.☆20Oct 15, 2019Updated 6 years ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 7 months ago
- Rodinia benchmark☆24Jul 5, 2024Updated last year
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆66Sep 9, 2025Updated 5 months ago
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆67Dec 10, 2025Updated 2 months ago
- A library to benchmark CUDA code, similar to google benchmark.☆30Apr 18, 2021Updated 4 years ago
- Compute applications.☆25Dec 12, 2019Updated 6 years ago
- A BUDE virtual-screening benchmark, in many programming models☆30Oct 15, 2024Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 10 months ago
- ☆24Jun 24, 2022Updated 3 years ago
- ☆24Nov 14, 2023Updated 2 years ago
- A library for directly calling TensorFlow / Keras ML models from Fortran.☆35Sep 3, 2023Updated 2 years ago
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated last month
- Training material on writing machine learning code with PyTorch by ICCS☆38Sep 11, 2025Updated 5 months ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆44Oct 25, 2021Updated 4 years ago
- A command line utility to manage the configuration of a system's high performance network interfaces for RoCE deployments☆35Jul 25, 2023Updated 2 years ago
- cuASR: CUDA Algebra for Semirings☆44Aug 22, 2022Updated 3 years ago
- ☆31Aug 28, 2020Updated 5 years ago
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆37Oct 29, 2025Updated 4 months ago
- ☆53Updated this week
- Examples on how to make use of DestinE Data Lake services☆14Feb 20, 2026Updated last week
- ext_mpi_collectives☆11Apr 1, 2025Updated 10 months ago