ekondis / cl2-reduce-bench
A test case for evaluating the performance of the workgroup reduction operation in OpenCL 2.0
☆9Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for cl2-reduce-bench
- Kernel Tuning Toolkit☆55Updated 3 weeks ago
- BLAS OpenCL implementation.☆15Updated 9 years ago
- Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices☆11Updated 7 years ago
- MIOpenGEMM is now deprecated☆61Updated last year
- amdgpu example code in hip/asm☆21Updated 2 weeks ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆170Updated last year
- CL Offline Compiler : Compile OpenCL kernels to HSAIL☆50Updated 7 years ago
- Python tools for NVIDIA Profiler☆21Updated 6 years ago
- ROCm OpenCL Compiler Tool Driver☆24Updated 5 years ago
- An implementation of BLAS using the SYCL open standard.☆259Updated 3 weeks ago
- ROCm Device Libraries☆98Updated 6 months ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Source for Demystifying GPU Microarchitecture through Microbenchmarking☆16Updated last year
- Intel® GPU Compute Samples☆97Updated this week
- RAND library for HIP programming language☆111Updated this week
- HIP back-end for Thrust that has been replaced by rocThrust☆28Updated last year
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated 11 months ago
- A system validation and diagnostics tool for monitoring, stress testing, detecting, and troubleshooting issues impacting AMD GPUs in high…☆66Updated this week
- maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas☆13Updated 5 years ago
- This repository contains my experiments with compression-related algorithms☆35Updated 8 years ago
- OpenCL/SPIR-V implementation of HIP☆104Updated 2 years ago
- ☆146Updated last week
- ROCm's Thunk Interface☆83Updated 2 weeks ago
- ☆118Updated 11 years ago
- Experimental Linear Algebra Performance Studies☆12Updated 7 years ago
- Next generation FFT implementation for ROCm☆176Updated this week
- Fork of magma to include more BLAS☆28Updated 7 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆45Updated 9 years ago
- C Framework for OpenCL☆108Updated 10 months ago