eth-cscs / COSMA
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆201Updated 3 months ago
Alternatives and similar repositories for COSMA:
Users that are interested in COSMA are comparing it to the libraries listed below
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆106Updated last year
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆141Updated this week
- RAJA Performance Suite☆118Updated last week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆243Updated 3 months ago
- High-performance, GPU-aware communication library☆84Updated 2 months ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆80Updated this week
- A massively-parallel, block-sparse tensor framework written in C++☆278Updated this week
- TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.☆119Updated 5 months ago
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆203Updated 7 months ago
- RAJA Performance Portability Layer (C++)☆507Updated this week
- A light-weight MPI profiler.☆89Updated 7 months ago
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆110Updated 2 months ago
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆55Updated last week
- Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels☆331Updated this week
- Data parallel C++ mathematical object library☆160Updated last week
- STREAM, for lots of devices written in many programming models☆329Updated 6 months ago
- Next generation LAPACK implementation for ROCm platform☆99Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- QUDA is a library for performing calculations in lattice QCD on GPUs.☆307Updated this week
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆423Updated last week
- Tutorials for the Kokkos C++ Performance Portability Programming Ecosystem☆316Updated 3 weeks ago
- ☆44Updated 3 weeks ago
- Advanced Profiling and Analytics for AMD Hardware☆142Updated last week
- ☆232Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 2 months ago
- Partitioned Global Address Space (PGAS) library for distributed arrays☆101Updated this week
- DaCe - Data Centric Parallel Programming☆515Updated this week
- Intermediate MPI lesson☆26Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated 3 weeks ago