RIKEN-RCCS / GEMMul8Links
GEMMul8 (GEMMulate): GEMM emulation using int8 matrix engines based on the Ozaki Scheme II
☆26Updated this week
Alternatives and similar repositories for GEMMul8
Users that are interested in GEMMul8 are comparing it to the libraries listed below
Sorting:
- ☆113Updated last week
- ☆77Updated last month
- Highly Efficient FFT for Exascale☆42Updated last year
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆129Updated 4 months ago
- A website covering major HPC technologies, designed to welcome contributions.☆76Updated last year
- ☆14Updated 3 years ago
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆71Updated last month
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆144Updated this week
- QUDA is a library for performing calculations in lattice QCD on GPUs.☆329Updated last week
- Training examples for SYCL☆49Updated last month
- Data parallel C++ mathematical object library☆165Updated last month
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆209Updated 4 months ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Updated 5 months ago
- Molecular dynamics proxy application based on Kokkos☆33Updated last year
- MILC collaboration code for lattice QCD calculations☆43Updated last week
- TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.☆133Updated this week
- An Adaptive Pencil Decomposition Library for NVIDIA GPUs☆68Updated last week
- RAJA Performance Suite☆123Updated this week
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆55Updated 2 months ago
- A parallel programming training mini app simulating weather-like flows☆168Updated last month
- A C++ library for computing large scale tensor contractions.☆38Updated 7 years ago
- The Chroma Software System for Lattice QCD☆66Updated this week
- ALCF Systems User Documentation☆29Updated this week
- ☆13Updated last week
- A BUDE virtual-screening benchmark, in many programming models☆29Updated 11 months ago
- CPE change log and release notes☆26Updated last year
- Fortran interfaces for ROCm libraries☆81Updated this week
- A flexible, templated GPU library of neighbor search algorithms.☆12Updated 4 years ago
- Run a parallel command inside a split tmux window☆152Updated 3 years ago
- QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experi…☆27Updated last year