gunrock / loopsLinks
π GPU load-balancing library for regular and irregular computations.
β62Updated last year
Alternatives and similar repositories for loops
Users that are interested in loops are comparing it to the libraries listed below
Sorting:
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernelsβ32Updated 4 years ago
- β€οΈ CUDA/C++ GPU graph analytics simplified.β31Updated 2 years ago
- development repository for the open earth compilerβ80Updated 4 years ago
- β46Updated 4 years ago
- GPU Performance Advisorβ66Updated 3 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidthβ106Updated 8 years ago
- An extension library of WMMA API (Tensor Core API)β99Updated last year
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suiteβ66Updated 6 years ago
- β51Updated 6 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018β72Updated 4 years ago
- Efficient SpGEMM on GPU using CUDA and CSRβ57Updated 2 years ago
- β95Updated 8 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.β98Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repoβ162Updated last week
- Benchmark for measuring the performance of sparse and irregular memory access.β78Updated 3 months ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarksβ24Updated last year
- CSR-based SpGEMM on nVidia and AMD GPUsβ46Updated 9 years ago
- β18Updated 5 years ago
- β106Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeonβ’ and AMD Instinctβ’ acceleratorsβ112Updated 3 months ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDAβ40Updated 6 years ago
- Dissecting NVIDIA GPU Architectureβ104Updated 3 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPUβ81Updated 5 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.β55Updated 5 months ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communicationβ29Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDAβ35Updated 5 years ago
- β64Updated 6 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.β32Updated 4 months ago
- A Library for fast Hash Tables on GPUsβ126Updated 3 years ago
- β10Updated 2 years ago