RIKEN-RCCS/GEMMul8

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RIKEN-RCCS/GEMMul8)

RIKEN-RCCS / GEMMul8

GEMMul8 (GEMMulate): GEMM emulation and its extension to BLAS-like matrix operations using INT8/FP8 matrix engines based on the Ozaki Scheme II

☆82

Alternatives and similar repositories for GEMMul8

Users that are interested in GEMMul8 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RIKEN-RCCS / accelerator_for_ozIMMU
View on GitHub
Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.
☆26Dec 10, 2025Updated 7 months ago
enp1s0 / ozIMMU
View on GitHub
FP64 equivalent GEMM by the Ozaki scheme with Int8 Tensor Cores
☆125Dec 2, 2025Updated 7 months ago
enp1s0 / cuMpSGEMM
View on GitHub
Fast SGEMM emulation on Tensor Cores
☆17Feb 16, 2025Updated last year
SC-SGS / hardware_sampling
View on GitHub
The Hardware Sampling (hws) library can be used to track hardware performance like clock frequency, memory usage, temperatures, or power …
☆21Jun 25, 2026Updated 3 weeks ago
NMSU-PEARL / GPUs-Energy
View on GitHub
[CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs
☆15Dec 11, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
itoyori / itoyori
View on GitHub
Itoyori: A distributed multi-threading runtime system for global-view fork-join task parallelism
☆23Feb 9, 2024Updated 2 years ago
SourceryTools / fortran-cuda-interfaces
View on GitHub
☆23Jul 8, 2024Updated 2 years ago
escalab / RTSpMSpM
View on GitHub
☆25Apr 13, 2025Updated last year
ye-luo / openmp-target
View on GitHub
OpenMP offload playground
☆10Nov 16, 2024Updated last year
RIKEN-RCCS / RAPTOR
View on GitHub
☆18Jun 26, 2026Updated 3 weeks ago
scale-snu / layered-prefill
View on GitHub
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall fre…
☆18Mar 9, 2026Updated 4 months ago
bienz2 / BibChecker
View on GitHub
Analyzes IEEE and ACM format bibliographies for correctness. Only to be used as a first pass. Anything that cannot be found automatical…
☆26Updated this week
wmmae / wmma_extension
View on GitHub
An extension library of WMMA API (Tensor Core API)
☆115Jul 12, 2024Updated 2 years ago
nakatamaho / dgemm_tutorial
View on GitHub
A gemm_tutorial
☆38May 31, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tkotani / ecalj
View on GitHub
The quasiparticle self-consistent GW method in the PMT method (LAPW+LMTO+Lo).
☆36Jun 7, 2026Updated last month
mthom / unmanaged-ctrie
View on GitHub
Concurrent hash tries for C++ 14 with no memory management whatsoever.
☆10Aug 30, 2016Updated 9 years ago
ucb-bar / radiance
View on GitHub
A Heterogeneous GPU Platform for AI and Neural Graphics
☆61Jun 22, 2026Updated 3 weeks ago
ParCoreLab / aCG
View on GitHub
GPU-accelerated linear solvers based on the conjugate gradient (CG) method, supporting NVIDIA and AMD GPUs with GPU-aware MPI, NCCL, RCCL…
☆16Mar 14, 2026Updated 4 months ago
vatai / tadashi
View on GitHub
A library for code transformations with guaranteed legality
☆18Jun 12, 2026Updated last month
It4innovations / Intel-SDE-FLOPS
View on GitHub
Computing FLOPs with Intel Software Development Emulator (Intel SDE)
☆27Oct 22, 2023Updated 2 years ago
DanieleDeSensi / mammut
View on GitHub
MAchine Micro Management UTilities
☆12Nov 5, 2020Updated 5 years ago
valentjn / uni-stuttgart-beamer-template
View on GitHub
Unofficial LaTeX template for Beamer presentations at the University of Stuttgart, Germany
☆26Jul 11, 2023Updated 3 years ago
daisytuner / docc
View on GitHub
Daisytuner Optimizing Compiler Collection (docc)
☆22Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Naifeng / moma
View on GitHub
a method for efficient large integer arithmetic in cryptography
☆16Sep 16, 2025Updated 10 months ago
wavefunction91 / ExchCXX
View on GitHub
Exchange correlation (XC) library for density functional theory (DFT) calculations in modern C++
☆28Jun 12, 2026Updated last month
olcf / hip-training-series
View on GitHub
Repository with examples and exercises for OLCF and AMD's HIP training series
☆17Oct 16, 2023Updated 2 years ago
Cerebras / sdk-examples
View on GitHub
☆47Apr 27, 2026Updated 2 months ago
perazz / fastmath
View on GitHub
A Modern Fortran library for fast, approximate math functions
☆17Jan 22, 2023Updated 3 years ago
simint-chem / simint-generator
View on GitHub
Code generator for simint vectorized integrals
☆29Mar 16, 2023Updated 3 years ago
yc2367 / BBS-MICRO
View on GitHub
☆19Nov 11, 2024Updated last year
avr-aics-riken / FFVC
View on GitHub
FFVC - Frontflow/violet Cartesian
☆14Apr 5, 2020Updated 6 years ago
bollu / polymage
View on GitHub
PolyMage is a domain-specific language and optimizing code generator for auto-parallelisation
☆14Jul 15, 2016Updated 10 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
periscop / clan
View on GitHub
Chunky Loop Analyzer: A Polyhedral Representation Extraction Tool for High Level Programs
☆26Dec 19, 2022Updated 3 years ago
wudu98 / autoGEMM
View on GitHub
☆15Dec 5, 2024Updated last year
brightlaboratory / polydl
View on GitHub
☆11Jun 29, 2021Updated 5 years ago
shixun404 / Fault-Tolerant-SGEMM-on-NVIDIA-GPUs
View on GitHub
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
☆14Apr 3, 2025Updated last year
BL-highprecision / QD
View on GitHub
A double-double and quad-double package for Fortran and C++
☆23May 7, 2026Updated 2 months ago
usagrada / satysfi-formatter
View on GitHub
Formatter for SATySFi
☆17Jun 7, 2025Updated last year
spcl / SMI
View on GitHub
Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
☆15Mar 1, 2022Updated 4 years ago