LLNL / Aluminum
High-performance, GPU-aware communication library
☆84Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Aluminum
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆102Updated last year
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆20Updated 6 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- ☆23Updated 3 years ago
- RAJA Performance Suite☆110Updated this week
- Comb is a communication performance benchmarking tool.☆24Updated last year
- SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability…☆99Updated this week
- Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric …☆63Updated last week
- Integrated Performance Monitoring for High Performance Computing☆85Updated 3 years ago
- ROC_SHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆41Updated last year
- A task benchmark☆40Updated 3 months ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆190Updated this week
- Subset of BLAS routines optimized for NVIDIA GPUs☆65Updated last year
- OpenSHMEM Implementation on MPI☆25Updated 2 months ago
- XSBench: The Monte Carlo Macroscopic Cross Section Lookup Benchmark☆72Updated 8 months ago
- Next generation LAPACK implementation for ROCm platform☆95Updated this week
- CUDA Tensor Transpose (cuTT) library☆50Updated 7 years ago
- Unified Collective Communication Library☆207Updated last week
- A light-weight MPI profiler.☆85Updated 3 months ago
- HPCG benchmark based on ROCm platform☆35Updated this week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆92Updated 2 years ago
- Partitioned Global Address Space (PGAS) library for distributed arrays☆101Updated this week
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated 2 years ago
- Copy-hiding array abstraction to automatically migrate data between memory spaces☆106Updated this week
- GPUDirect Async support for IB Verbs☆90Updated 2 years ago
- Reference implementations of MLPerf™ HPC training benchmarks☆42Updated 5 months ago
- Distributed View Extension for Kokkos☆43Updated 2 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆100Updated this week
- GPU Code optimizer for stencil computations. Refer to our IPDPS'19 paper for more details☆23Updated 5 years ago
- TAU Performance System Public Mirror (Updated every night at midnight, USA Pacific Time)☆39Updated 2 weeks ago