Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner
☆21Sep 12, 2025Updated 6 months ago
Alternatives and similar repositories for kernel_launcher
Users that are interested in kernel_launcher are comparing it to the libraries listed below
Sorting:
- A GPU benchmark suite for autotuners☆19Feb 20, 2024Updated 2 years ago
- Kernel Tuner☆389Updated this week
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- ☆11Jun 9, 2023Updated 2 years ago
- GPUDirect Async implementation of HPGMG-FV CUDA☆11May 11, 2018Updated 7 years ago
- PyTorch block-diagonal ODE CUDA solver, designed for gradient-based optimization☆16Apr 27, 2020Updated 5 years ago
- ☆11Aug 8, 2021Updated 4 years ago
- A rust wrapper for HIP☆12Jun 10, 2025Updated 9 months ago
- High-performance graph processing on hybrid CPU-GPU platforms by using dynamic load-balancing☆12Sep 15, 2016Updated 9 years ago
- High-Performance Linpack Benchmark adopted version for GPU backend☆12Sep 12, 2022Updated 3 years ago
- Instructions and templates for SC authors☆17Aug 22, 2021Updated 4 years ago
- Rodinia benchmark☆24Jul 5, 2024Updated last year
- CLTune: An automatic OpenCL & CUDA kernel tuner☆185Dec 12, 2022Updated 3 years ago
- ☆17Dec 8, 2023Updated 2 years ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 3 months ago
- C++ HPC Math Library☆47Dec 9, 2019Updated 6 years ago
- A Monte Carlo Neutron Transport Mini-App☆15Apr 15, 2019Updated 6 years ago
- Vikunja is a performance portable algorithm library that defines functions operating on ranges of elements for a variety of purposes . It…☆16Oct 10, 2023Updated 2 years ago
- An HPL-AI implementation for Fugaku☆23Jun 29, 2021Updated 4 years ago
- A Brainfuck to binary compiler using LLVM, written in OCaml.☆20Apr 24, 2023Updated 2 years ago
- HPC Game Platform☆11Apr 20, 2023Updated 2 years ago
- Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.☆20Oct 15, 2019Updated 6 years ago
- ngAP's artifact for ASPLOS'24☆26Jul 29, 2025Updated 7 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Apr 2, 2025Updated 11 months ago
- Parse objdump files using tree-sitter☆13Nov 22, 2023Updated 2 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆31Apr 18, 2021Updated 4 years ago
- Ansible role for OpenHPC☆51Mar 2, 2026Updated 2 weeks ago
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆48Feb 10, 2015Updated 11 years ago
- Collection of full, mini, proxy, and benchmark apps.☆11Feb 14, 2020Updated 6 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆57Mar 20, 2025Updated last year
- A BUDE virtual-screening benchmark, in many programming models☆30Oct 15, 2024Updated last year
- A mini-app to solve the heat conduction equation☆15Jul 1, 2020Updated 5 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆66Sep 9, 2025Updated 6 months ago
- ☆31Aug 28, 2020Updated 5 years ago
- micro editor plugin that provides zig fmt integration☆15Jul 1, 2022Updated 3 years ago
- cuASR: CUDA Algebra for Semirings☆45Aug 22, 2022Updated 3 years ago
- outline and links for PLDI 2022 tutorial☆17Jun 13, 2022Updated 3 years ago
- Command-line based calculator written in Rust☆29Feb 17, 2021Updated 5 years ago