ROCm / rocSHMEMLinks

rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.

☆97

Alternatives and similar repositories for rocSHMEM

Users that are interested in rocSHMEM are comparing it to the libraries listed below

Sorting:

ROCm / rccl-tests
RCCL Performance Benchmark Tests
☆71Updated this week
merthidayetoglu / HiCCL
A hierarchical collective communications library with portable optimizations
☆36Updated 7 months ago
uuudown / Tartan
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
☆66Updated 6 years ago
ROCm / rocprofiler-compute
Advanced Profiling and Analytics for AMD Hardware
☆161Updated this week
ROCm / rocprofiler
ROC profiler library. Profiling with perf-counters and derived metrics.
☆151Updated 2 weeks ago
GVProf / GVProf
GVProf: A Value Profiler for GPU-based Clusters
☆51Updated last year
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
gpudirect / libgdsync
GPUDirect Async support for IB Verbs
☆128Updated 2 years ago
c3sr / tcu_scope
☆51Updated 6 years ago
Jokeren / GPA
GPU Performance Advisor
☆65Updated 3 years ago
sunlex0717 / DissectingTensorCores
☆106Updated last year
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆35Updated 5 years ago
ParCoreLab / Snoopie
Multi-GPU communication profiler and visualizer
☆31Updated last year
ROCm / rocMLIR
☆148Updated this week
ROCm / TransferBench
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
☆44Updated this week
NVlabs / NVBit
☆270Updated 2 months ago
ROCm / roctracer
ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
☆84Updated 2 weeks ago
cyanguwa / nersc-roofline
☆45Updated 4 years ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆103Updated 3 years ago
microsoft / NPKit
NCCL Profiling Kit
☆139Updated last year
openucx / ucc
Unified Collective Communication Library
☆263Updated this week
intel / cutlass-sycl
A CUTLASS implementation using SYCL
☆32Updated 3 weeks ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
HAWAIILAB / cuda-flux
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
☆32Updated 4 years ago
merthidayetoglu / CommBench
A Micro-benchmarking Tool for HPC Networks
☆32Updated last week
ROCm / rocm_bandwidth_test
Bandwidth test for ROCm
☆62Updated this week
ROCm / rocWMMA
rocWMMA
☆121Updated this week
OSU-STARLAB / UVM_benchmark
☆27Updated 4 years ago
gunrock / loops
🎃 GPU load-balancing library for regular and irregular computations.
☆62Updated last year
eth-cscs / Tiled-MM
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
☆32Updated 4 months ago