gunrock / loopsLinks
π GPU load-balancing library for regular and irregular computations.
β62Updated last year
Alternatives and similar repositories for loops
Users that are interested in loops are comparing it to the libraries listed below
Sorting:
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernelsβ32Updated 4 years ago
- development repository for the open earth compilerβ80Updated 4 years ago
- Benchmark for measuring the performance of sparse and irregular memory access.β78Updated 2 months ago
- An extension library of WMMA API (Tensor Core API)β99Updated last year
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.β97Updated this week
- GPU Performance Advisorβ65Updated 3 years ago
- β94Updated 8 years ago
- β€οΈ CUDA/C++ GPU graph analytics simplified.β31Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidthβ106Updated 7 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suiteβ66Updated 6 years ago
- β45Updated 4 years ago
- β51Updated 6 years ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDAβ41Updated 6 years ago
- Advanced Profiling and Analytics for AMD Hardwareβ161Updated this week
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018β72Updated 4 years ago
- β18Updated 5 years ago
- Efficient SpGEMM on GPU using CUDA and CSRβ57Updated 2 years ago
- CSR-based SpGEMM on nVidia and AMD GPUsβ46Updated 9 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeonβ’ and AMD Instinctβ’ acceleratorsβ110Updated 2 months ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarksβ24Updated last year
- Dissecting NVIDIA GPU Architectureβ103Updated 3 years ago
- Assembler for NVIDIA Volta and Turing GPUsβ226Updated 3 years ago
- β106Updated last year
- β64Updated 6 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.β134Updated last year
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systemsβ132Updated 5 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.β55Updated 4 months ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communicationβ29Updated 2 years ago
- β249Updated last month
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.β32Updated 4 months ago