gunrock / loopsLinks
π GPU load-balancing library for regular and irregular computations.
β62Updated last month
Alternatives and similar repositories for loops
Users that are interested in loops are comparing it to the libraries listed below
Sorting:
- development repository for the open earth compilerβ80Updated 4 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.β121Updated this week
- β48Updated 5 years ago
- β€οΈ CUDA/C++ GPU graph analytics simplified.β31Updated 3 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernelsβ32Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repoβ166Updated last week
- An extension library of WMMA API (Tensor Core API)β106Updated last year
- GPU Performance Advisorβ65Updated 3 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarksβ25Updated last year
- β109Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidthβ107Updated 8 years ago
- β50Updated 6 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suiteβ66Updated 7 years ago
- Benchmark for measuring the performance of sparse and irregular memory access.β79Updated 2 months ago
- Efficient SpGEMM on GPU using CUDA and CSRβ57Updated 2 years ago
- β93Updated 8 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018β73Updated 5 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.β56Updated 7 months ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDAβ40Updated 6 years ago
- β19Updated 5 years ago
- β10Updated 2 years ago
- β286Updated last month
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.β134Updated 2 years ago
- CSR-based SpGEMM on nVidia and AMD GPUsβ46Updated 9 years ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communicationβ29Updated 2 years ago
- Dissecting NVIDIA GPU Architectureβ109Updated 3 years ago
- Subset of BLAS routines optimized for NVIDIA GPUsβ73Updated 2 years ago
- β63Updated 10 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeonβ’ and AMD Instinctβ’ acceleratorsβ118Updated 5 months ago
- Third party assembler and GEMM library for NVIDIA Kepler GPUβ82Updated 6 years ago