LLNL / Aluminum
High-performance, GPU-aware communication library
☆85Updated 4 months ago
Alternatives and similar repositories for Aluminum
Users that are interested in Aluminum are comparing it to the libraries listed below
Sorting:
- Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric …☆69Updated last month
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆52Updated last week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Reference implementations of MLPerf™ HPC training benchmarks☆47Updated 2 months ago
- ☆43Updated 4 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆59Updated 2 years ago
- Integrated Performance Monitoring for High Performance Computing☆87Updated 3 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- Autonomic Performance Environment for eXascale (APEX)☆47Updated this week
- ☆23Updated 3 years ago
- DLA-Future☆73Updated this week
- A task benchmark☆42Updated 9 months ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- GPUDirect Async support for IB Verbs☆112Updated 2 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆83Updated this week
- Comb is a communication performance benchmarking tool.☆24Updated 2 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆24Updated 7 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆46Updated 10 years ago
- RAJA Performance Suite☆117Updated this week
- OpenSHMEM Implementation on MPI☆26Updated 2 months ago
- HPCG benchmark based on ROCm platform☆37Updated 2 months ago
- A unified framework across multiple programming platforms☆37Updated 10 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated last month
- Distributed View Extension for Kokkos☆45Updated 5 months ago
- Parallel Tensor Infrastructure (ParTI!)☆28Updated 4 years ago
- oneAPI Collective Communications Library (oneCCL)☆233Updated last week
- SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability…☆102Updated 2 months ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆108Updated 2 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated last week
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 11 months ago