KernelTuner / kernel_tuner
Kernel Tuner
☆331Updated this week
Alternatives and similar repositories for kernel_tuner:
Users that are interested in kernel_tuner are comparing it to the libraries listed below
- CUDA Kernel Benchmarking Library☆629Updated this week
- collection of benchmarks to measure basic GPU capabilities☆369Updated 2 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆218Updated 3 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆256Updated last month
- Experimental projects related to TensorRT☆99Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆697Updated 2 months ago
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- Stretching GPU performance for GEMMs and tensor contractions.☆237Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆323Updated this week
- CUDA Matrix Multiplication Optimization☆184Updated 9 months ago
- rocWMMA☆110Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆390Updated this week
- Step-by-step optimization of CUDA SGEMM☆314Updated 3 years ago
- Training material for Nsight developer tools☆157Updated 8 months ago
- Advanced Profiling and Analytics for AMD Hardware☆152Updated this week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆178Updated 2 years ago
- ROCm Communication Collectives Library (RCCL)☆330Updated this week
- ☆537Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆187Updated 2 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆81Updated last year
- The Foundation for All Legate Libraries☆216Updated this week
- Examples for HIP☆205Updated 5 months ago
- ☆251Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 3 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆205Updated last month
- STREAM, for lots of devices written in many programming models☆334Updated 8 months ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆397Updated 3 months ago
- A library of GPU kernels for sparse matrix operations.☆264Updated 4 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆493Updated 2 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆131Updated 4 years ago