[DEPRECATED] Moved to ROCm/rocm-systems repo
☆144Feb 23, 2026Updated last week
Alternatives and similar repositories for rocSHMEM
Users that are interested in rocSHMEM are comparing it to the libraries listed below
Sorting:
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆57Updated this week
- AMD’s C++ library for accelerating tensor primitives☆49Feb 18, 2026Updated last week
- ☆60Feb 23, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆86Feb 11, 2026Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆26Jan 21, 2026Updated last month
- NCCL Profiling Kit☆152Jul 1, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆84Feb 11, 2026Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Feb 16, 2026Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆411Feb 23, 2026Updated last week
- ☆17Nov 11, 2025Updated 3 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆154Jan 21, 2026Updated last month
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆469Updated this week
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- Intel® SHMEM - Device initiated shared memory based communication library☆32Nov 12, 2025Updated 3 months ago
- HPCG benchmark based on ROCm platform☆39Updated this week
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆93Jan 16, 2026Updated last month
- Microsoft Collective Communication Library☆66Nov 23, 2024Updated last year
- ☆38Aug 7, 2025Updated 6 months ago
- Microsoft Collective Communication Library☆385Sep 20, 2023Updated 2 years ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆177Updated this week
- ☆26May 19, 2021Updated 4 years ago
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆110Jun 28, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆178Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆139Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆898Updated this week
- Autonomous GPU Kernel Generation & Optimization via Deep Agents☆242Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,361Feb 13, 2026Updated 2 weeks ago
- ☆24May 9, 2025Updated 9 months ago
- ☆18Nov 11, 2025Updated 3 months ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆84Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆69Feb 10, 2026Updated 2 weeks ago
- Experimental Explicit Communications API for Kokkos☆30Feb 20, 2026Updated last week
- Synthesizer for optimal collective communication algorithms☆124Apr 8, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆521Updated this week
- CMake modules used within the ROCm libraries☆73Feb 23, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆124Updated this week
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Jul 23, 2024Updated last year