gunrock / loopsLinks
π GPU load-balancing library for regular and irregular computations.
β62Updated this week
Alternatives and similar repositories for loops
Users that are interested in loops are comparing it to the libraries listed below
Sorting:
- development repository for the open earth compilerβ80Updated 4 years ago
- β47Updated 5 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.β112Updated this week
- GPU Performance Advisorβ66Updated 3 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernelsβ32Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)β104Updated last year
- β50Updated 6 years ago
- β€οΈ CUDA/C++ GPU graph analytics simplified.β31Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repoβ162Updated last week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidthβ106Updated 8 years ago
- β18Updated 5 years ago
- β95Updated 8 years ago
- β107Updated last year
- Implementation and analysis of five different GPU based SPMV algorithms in CUDAβ41Updated 6 years ago
- Subset of BLAS routines optimized for NVIDIA GPUsβ72Updated 2 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018β73Updated 4 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suiteβ66Updated 7 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPUβ82Updated 5 years ago
- Benchmark for measuring the performance of sparse and irregular memory access.β79Updated 3 weeks ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.β32Updated 5 months ago
- β31Updated 3 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarksβ24Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.β134Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeonβ’ and AMD Instinctβ’ acceleratorsβ113Updated 3 months ago
- β149Updated this week
- Dissecting NVIDIA GPU Architectureβ106Updated 3 years ago
- Assembler for NVIDIA Volta and Turing GPUsβ229Updated 3 years ago
- β62Updated 8 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repoβ31Updated last week
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)β140Updated 5 years ago