LLNL / Aluminum
High-performance, GPU-aware communication library
☆84Updated 3 weeks ago
Alternatives and similar repositories for Aluminum:
Users that are interested in Aluminum are comparing it to the libraries listed below
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆21Updated 6 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆48Updated this week
- Comb is a communication performance benchmarking tool.☆24Updated last year
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆46Updated 9 years ago
- ☆23Updated 3 years ago
- Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric …☆63Updated last week
- ☆42Updated 4 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆67Updated last year
- A task benchmark☆40Updated 5 months ago
- ☆92Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- RAJA Performance Suite☆118Updated this week
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆198Updated last month
- Autonomic Performance Environment for eXascale (APEX)☆42Updated 2 weeks ago
- OpenSHMEM Implementation on MPI☆25Updated 4 months ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆105Updated last year
- A light-weight MPI profiler.☆86Updated 6 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- This package includes the implementation for four sparse linear algebra kernels: Sparse-Matrix-Vector-Multiplication (SpMV), Sparse-Trian…☆26Updated 4 years ago
- oneAPI Collective Communications Library (oneCCL)☆218Updated last week
- GPUDirect Async support for IB Verbs☆95Updated 2 years ago
- Next generation SPARSE implementation for ROCm platform☆118Updated this week
- A GPU accelerated error-bounded lossy compression for scientific data.☆69Updated this week
- DLA-Future☆69Updated this week
- A Micro-benchmarking Tool for HPC Networks☆24Updated 2 weeks ago
- Next generation LAPACK implementation for ROCm platform☆98Updated this week
- Integrated Performance Monitoring for High Performance Computing☆87Updated 3 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago