eth-cscs / COSMA
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆205Updated last month
Alternatives and similar repositories for COSMA:
Users that are interested in COSMA are comparing it to the libraries listed below
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆142Updated this week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆256Updated last month
- RAJA Performance Suite☆117Updated 3 weeks ago
- Next generation LAPACK implementation for ROCm platform☆100Updated this week
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆108Updated last year
- Advanced Profiling and Analytics for AMD Hardware☆152Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆113Updated 3 months ago
- RAJA Performance Portability Layer (C++)☆516Updated this week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆80Updated last week
- ☆50Updated this week
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆57Updated 2 weeks ago
- Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels☆336Updated this week
- A massively-parallel, block-sparse tensor framework written in C++☆285Updated this week
- A light-weight MPI profiler.☆94Updated 9 months ago
- Next generation SPARSE implementation for ROCm platform☆121Updated this week
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆431Updated 2 weeks ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆53Updated 2 months ago
- High-performance, GPU-aware communication library☆85Updated 3 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated this week
- ROCm BLAS marshalling library☆140Updated this week
- Data parallel C++ mathematical object library☆163Updated this week
- A task benchmark☆42Updated 9 months ago
- Stretching GPU performance for GEMMs and tensor contractions.☆237Updated last week
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- Partitioned Global Address Space (PGAS) library for distributed arrays☆102Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated this week
- ☆91Updated 8 years ago
- Information about many aspects of high-performance computing. Wiki content moved to ~/docs.☆290Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆78Updated this week
- OpenMP Offloading Validation & Verification Suite; Official repository. We have migrated from bitbucket!! For documentation, results, pub…☆58Updated this week