enp1s0 / cuMpSGEMMView external linksLinks
Fast SGEMM emulation on Tensor Cores
☆17Feb 16, 2025Updated 11 months ago
Alternatives and similar repositories for cuMpSGEMM
Users that are interested in cuMpSGEMM are comparing it to the libraries listed below
Sorting:
- GEMMul8 (GEMMulate): GEMM emulation using int8 matrix engines based on the Ozaki Scheme II☆48Jan 20, 2026Updated 3 weeks ago
- Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.☆19Dec 10, 2025Updated 2 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆111Dec 2, 2025Updated 2 months ago
- Flux tutorial slides and materials☆23Feb 7, 2026Updated last week
- A C++based implementation of the TeaLeaf heat conduction mini-app. This implementation of TeaLeaf replicates the functionality of the ref…☆25Aug 11, 2024Updated last year
- CUDA GPU Benchmark☆36Jan 31, 2025Updated last year
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated 2 weeks ago
- PARADIS, a lightweight and flexible weather forecast model that tries to Keep It Simple.☆25Feb 4, 2026Updated last week
- Memory Topology for GPUs☆17Dec 9, 2025Updated 2 months ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 10 months ago
- A hierarchical collective communications library with portable optimizations☆37Dec 8, 2024Updated last year
- OpenMP offload playground☆10Nov 16, 2024Updated last year
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Feb 4, 2026Updated last week
- ☆10Feb 5, 2026Updated last week
- How to build an ACP compliant agent that uses MCP as well!☆11May 6, 2025Updated 9 months ago
- EPOCH Input System Version 2☆10Jun 5, 2020Updated 5 years ago
- ☆11Feb 27, 2024Updated last year
- Performance Counter Reader☆11Sep 14, 2022Updated 3 years ago
- GPU based 2D elastic FWI☆11Mar 6, 2018Updated 7 years ago
- Sequential Parameter Optimization in Python☆14Jan 12, 2026Updated last month
- Code for paper "Beyond Closure Models: Learning Chaotic Systems via Physics-Informed Neural Operators".☆14Dec 24, 2025Updated last month
- 2D time-domain isotropic (visco)elastic FD modeling and full waveform inversion (FWI) code for SH-waves☆13Aug 9, 2020Updated 5 years ago
- Developing multi platform gesture detector application by applying concepts learnt in Embedded Systems course on peripheral devices.☆21Dec 8, 2023Updated 2 years ago
- Argonne Leadership Computing Facility OpenCL tutorial☆10Aug 22, 2025Updated 5 months ago
- Build tools for Open-CE☆13Nov 13, 2025Updated 3 months ago
- Python routines for parallel analysis of large MITgcm simulations☆12Jun 23, 2016Updated 9 years ago
- [CVPR 2025] QuartDepth☆16Mar 24, 2025Updated 10 months ago
- OpenVINO LLM Benchmark☆11Dec 7, 2023Updated 2 years ago
- Continuum Dynamics Evaluation and Test Suite☆15Aug 29, 2017Updated 8 years ago
- A development test suite for Linux SCTP project☆13Jan 18, 2018Updated 8 years ago
- Multidimensional arrays for C++. (Not an official Boost library) \\ This is a mirror of gitlab.com/correaa/boost-multi☆13Updated this week
- SODECL is a library of ordinary differential equation (ODE) and stochastic differential equation (SDE) solvers in OpenCL.☆11Jul 4, 2020Updated 5 years ago
- oneAPI Deep Neural Network Library (oneDNN)☆10Feb 2, 2022Updated 4 years ago
- ExaWorks SDK☆11Feb 1, 2024Updated 2 years ago
- ☆11Apr 10, 2019Updated 6 years ago
- ☆11Dec 22, 2024Updated last year
- Reference implementation for the climate segmentation benchmark, based on the Exascale Deep Learning for Climate Analytics work☆10May 6, 2020Updated 5 years ago
- ☆11Oct 27, 2025Updated 3 months ago
- ☆18Sep 10, 2025Updated 5 months ago