ekondis / mixbench
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
☆392Updated 3 months ago
Alternatives and similar repositories for mixbench:
Users that are interested in mixbench are comparing it to the libraries listed below
- Stretching GPU performance for GEMMs and tensor contractions.☆235Updated this week
- A tool which profiles OpenCL devices to find their peak capacities☆437Updated 3 months ago
- CUDA Kernel Benchmarking Library☆618Updated this week
- Examples for HIP☆204Updated 4 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 3 months ago
- The SHOC Benchmark Suite☆251Updated 3 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆216Updated 3 years ago
- STREAM, for lots of devices written in many programming models☆332Updated 7 months ago
- Next generation BLAS implementation for ROCm platform☆362Updated this week
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- collection of benchmarks to measure basic GPU capabilities☆354Updated 2 months ago
- ROCm Communication Collectives Library (RCCL)☆317Updated this week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆225Updated last week
- ROCm Device Libraries☆97Updated 11 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆678Updated last month
- ROCm BLAS marshalling library☆136Updated this week
- ☆240Updated 2 months ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆177Updated 2 years ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆401Updated 2 months ago
- Next generation FFT implementation for ROCm☆190Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆376Updated this week
- rocWMMA☆106Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆141Updated this week
- ROCm Parallel Primitives☆171Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- Intercept Layer for Debugging and Analyzing OpenCL Applications☆327Updated last week
- ☆61Updated 3 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆571Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆369Updated last week