eth-cscs / COSMALinks
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆205Updated 3 weeks ago
Alternatives and similar repositories for COSMA
Users that are interested in COSMA are comparing it to the libraries listed below
Sorting:
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆264Updated this week
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆108Updated 2 years ago
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆142Updated last week
- Advanced Profiling and Analytics for AMD Hardware☆156Updated this week
- ☆57Updated 2 weeks ago
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆119Updated this week
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆432Updated this week
- RAJA Performance Suite☆117Updated this week
- A light-weight MPI profiler.☆95Updated 10 months ago
- Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels☆340Updated this week
- High-performance, GPU-aware communication library☆85Updated 4 months ago
- Next generation LAPACK implementation for ROCm platform☆101Updated this week
- A massively-parallel, block-sparse tensor framework written in C++☆292Updated last week
- Next generation SPARSE implementation for ROCm platform☆125Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆119Updated this week
- RAJA Performance Portability Layer (C++)☆519Updated this week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- Kernel Tuner☆337Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆86Updated this week
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆205Updated 9 months ago
- Partitioned Global Address Space (PGAS) library for distributed arrays☆105Updated 2 weeks ago
- ☆95Updated this week
- STREAM, for lots of devices written in many programming models☆339Updated 9 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆82Updated last year
- ☆91Updated 8 years ago
- Training examples for SYCL☆42Updated last month
- The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.☆214Updated this week
- Information about many aspects of high-performance computing. Wiki content moved to ~/docs.☆291Updated last month
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆54Updated last week
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆58Updated 2 weeks ago