gunrock / loopsLinks
๐ GPU load-balancing library for regular and irregular computations.
โ62Updated last year
Alternatives and similar repositories for loops
Users that are interested in loops are comparing it to the libraries listed below
Sorting:
- โค๏ธ CUDA/C++ GPU graph analytics simplified.โ31Updated 2 years ago
- โ44Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)โ99Updated 11 months ago
- โ51Updated 5 years ago
- GPU Performance Advisorโ65Updated 2 years ago
- โ98Updated last year
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernelsโ32Updated 4 years ago
- Dissecting NVIDIA GPU Architectureโ96Updated 2 years ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDAโ40Updated 6 years ago
- development repository for the open earth compilerโ80Updated 4 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018โ72Updated 4 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.โ90Updated this week
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suiteโ65Updated 6 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidthโ105Updated 7 years ago
- Benchmark for measuring the performance of sparse and irregular memory access.โ78Updated last month
- Advanced Profiling and Analytics for AMD Hardwareโ156Updated this week
- NUMA-aware multi-CPU multi-GPU data transfer benchmarksโ23Updated last year
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.โ66Updated 3 weeks ago
- โ18Updated 5 years ago
- CSR-based SpGEMM on nVidia and AMD GPUsโ46Updated 9 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDAโ32Updated 4 years ago
- โ44Updated 4 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.โ133Updated last year
- โ40Updated this week
- โ148Updated this week
- Efficient SpGEMM on GPU using CUDA and CSRโ56Updated last year
- A GPU algorithm for sparse matrix-matrix multiplicationโ71Updated 4 years ago
- Intelยฎ Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.โ134Updated last week
- โ91Updated 8 years ago
- LLVM/MLIR based compiler instrumentation of AMD GPU kernelsโ18Updated last month