ROCm / rocSHMEM
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
☆50Updated this week
Alternatives and similar repositories for rocSHMEM:
Users that are interested in rocSHMEM are comparing it to the libraries listed below
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆46Updated last week
- Advanced Profiling and Analytics for AMD Hardware☆140Updated this week
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆39Updated this week
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆64Updated 6 years ago
- ROCm SPARSE marshalling library☆67Updated this week
- RCCL Performance Benchmark Tests☆59Updated last month
- ☆42Updated 4 years ago
- HPCG benchmark based on ROCm platform☆37Updated last month
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆17Updated this week
- Bandwidth test for ROCm☆54Updated this week
- Next generation SPARSE implementation for ROCm platform☆119Updated this week
- oneAPI Level Zero Conformance & Performance test content☆48Updated last week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆135Updated this week
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆48Updated 4 months ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Updated 3 years ago
- Performance Prediction Toolkit☆51Updated 2 months ago
- GPU Performance Advisor☆64Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- ☆60Updated 2 months ago
- ☆51Updated 5 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆73Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆60Updated 8 months ago
- A tracing infrastructure for heterogeneous computing applications.☆29Updated this week
- MPI accelerator-integrated communication extensions☆32Updated last year
- ☆47Updated 5 years ago
- ☆44Updated 2 months ago
- An extension library of WMMA API (Tensor Core API)☆88Updated 7 months ago
- Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric …☆64Updated this week