eth-cscs / COSMA
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆200Updated 3 months ago
Alternatives and similar repositories for COSMA:
Users that are interested in COSMA are comparing it to the libraries listed below
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆214Updated 3 months ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆106Updated last year
- RAJA Performance Suite☆119Updated this week
- Next generation LAPACK implementation for ROCm platform☆99Updated this week
- Advanced Profiling and Analytics for AMD Hardware☆141Updated this week
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆140Updated this week
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆55Updated this week
- High-performance, GPU-aware communication library☆84Updated 2 months ago
- Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels☆329Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 8 months ago
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆110Updated 2 months ago
- A massively-parallel, block-sparse tensor framework written in C++☆275Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆78Updated this week
- Next generation SPARSE implementation for ROCm platform☆119Updated this week
- Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools☆121Updated last month
- ☆232Updated this week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- Data parallel C++ mathematical object library☆159Updated this week
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆423Updated this week
- OpenMP Offloading Validation & Verification Suite; Official repository. We have migrated from bitbucket!! For documentation, results, pub…☆56Updated this week
- Kernel Tuner☆323Updated this week
- QUDA is a library for performing calculations in lattice QCD on GPUs.☆307Updated this week
- STREAM, for lots of devices written in many programming models☆328Updated 6 months ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated 2 weeks ago
- ROCm Parallel Primitives☆170Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- RAJA Performance Portability Layer (C++)☆506Updated this week
- collection of benchmarks to measure basic GPU capabilities☆304Updated last month