RIKEN-RCCS / GEMMul8Links
GEMMul8 (GEMMulate): GEMM emulation using int8 matrix engines based on the Ozaki Scheme II
☆22Updated this week
Alternatives and similar repositories for GEMMul8
Users that are interested in GEMMul8 are comparing it to the libraries listed below
Sorting:
- ☆65Updated 2 weeks ago
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆124Updated 2 months ago
- Training examples for SYCL☆49Updated last week
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆68Updated 2 months ago
- ☆102Updated last week
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆144Updated this week
- Data parallel C++ mathematical object library☆163Updated 3 weeks ago
- MILC collaboration code for lattice QCD calculations☆41Updated this week
- ☆14Updated 2 years ago
- Highly Efficient FFT for Exascale☆39Updated last year
- Molecular dynamics proxy application based on Kokkos☆34Updated last year
- A website covering major HPC technologies, designed to welcome contributions.☆73Updated last year
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆55Updated 2 weeks ago
- An Adaptive Pencil Decomposition Library for NVIDIA GPUs☆66Updated 2 weeks ago
- TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.☆129Updated last week
- ScaLAPACK development repository☆151Updated last week
- Run a parallel command inside a split tmux window☆150Updated 3 years ago
- QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experi…☆27Updated last year
- ALCF Systems User Documentation☆28Updated last week
- QUDA is a library for performing calculations in lattice QCD on GPUs.☆328Updated 2 weeks ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆209Updated 3 months ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Updated 3 months ago
- The Chroma Software System for Lattice QCD☆66Updated 3 months ago
- CPE change log and release notes☆26Updated 11 months ago
- A parallel programming training mini app simulating weather-like flows☆163Updated 6 months ago
- RAJA Performance Suite☆119Updated last week
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated 4 months ago
- A C++ library for computing large scale tensor contractions.☆38Updated 7 years ago
- A BUDE virtual-screening benchmark, in many programming models☆29Updated 9 months ago
- Distributed memory, MPI based SuperLU☆206Updated this week