KernelTuner / kernel_tunerLinks
Kernel Tuner
☆344Updated this week
Alternatives and similar repositories for kernel_tuner
Users that are interested in kernel_tuner are comparing it to the libraries listed below
Sorting:
- CUDA Kernel Benchmarking Library☆666Updated last week
- Assembler for NVIDIA Volta and Turing GPUs☆221Updated 3 years ago
- collection of benchmarks to measure basic GPU capabilities☆384Updated 4 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆245Updated this week
- ☆542Updated last week
- STREAM, for lots of devices written in many programming models☆343Updated 9 months ago
- CUDA Matrix Multiplication Optimization☆194Updated 11 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆206Updated last month
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆274Updated last week
- Experimental projects related to TensorRT☆105Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 5 months ago
- A library of GPU kernels for sparse matrix operations.☆265Updated 4 years ago
- Shared Middle-Layer for Triton Compilation☆255Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆423Updated this week
- Advanced Profiling and Analytics for AMD Hardware☆156Updated this week
- ☆62Updated 6 months ago
- ☆247Updated last week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆179Updated 2 years ago
- Step-by-step optimization of CUDA SGEMM☆339Updated 3 years ago
- Fastest kernels written from scratch☆281Updated 2 months ago
- An extension library of WMMA API (Tensor Core API)☆99Updated 11 months ago
- rocWMMA☆115Updated this week
- ☆98Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆98Updated last month
- ROCm BLAS marshalling library☆144Updated this week
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆132Updated 5 years ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆337Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆737Updated 4 months ago
- Training material for Nsight developer tools☆159Updated 10 months ago