RIKEN-RCCS / GEMMul8Links
GEMMul8 (GEMMulate): GEMM emulation using int8 matrix engines based on the Ozaki Scheme II
☆25Updated 2 weeks ago
Alternatives and similar repositories for GEMMul8
Users that are interested in GEMMul8 are comparing it to the libraries listed below
Sorting:
- ☆74Updated this week
- TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.☆131Updated 2 weeks ago
- ☆14Updated 2 years ago
- ☆105Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆127Updated 3 months ago
- Data parallel C++ mathematical object library☆165Updated 2 weeks ago
- DBCSR: Distributed Block Compressed Sparse Row matrix library☆145Updated last week
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆55Updated last month
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Updated 4 months ago
- Highly Efficient FFT for Exascale☆42Updated last year
- An Adaptive Pencil Decomposition Library for NVIDIA GPUs☆69Updated this week
- A website covering major HPC technologies, designed to welcome contributions.☆73Updated last year
- PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core …☆69Updated 3 weeks ago
- OpenMP Training Series, May to October 2024☆18Updated 10 months ago
- Molecular dynamics proxy application based on Kokkos☆33Updated last year
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆209Updated 4 months ago
- MILC collaboration code for lattice QCD calculations☆43Updated this week
- QUDA is a library for performing calculations in lattice QCD on GPUs.☆331Updated last week
- Training examples for SYCL☆49Updated 2 weeks ago
- QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experi…☆27Updated last year
- ☆32Updated 2 weeks ago
- A BUDE virtual-screening benchmark, in many programming models☆29Updated 10 months ago
- A C++ library for computing large scale tensor contractions.☆38Updated 7 years ago
- Tensor Contraction Code Generator☆38Updated 8 years ago
- ScaLAPACK development repository☆154Updated last week
- Fortran interfaces for ROCm libraries☆81Updated this week
- Run a parallel command inside a split tmux window☆151Updated 3 years ago
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆207Updated 2 months ago
- RAJA Performance Suite☆121Updated this week
- Tensor Algebra Library Routines for Shared Memory Systems☆38Updated last year