enp1s0/cuMpSGEMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/enp1s0/cuMpSGEMM)

enp1s0 / cuMpSGEMM

Fast SGEMM emulation on Tensor Cores

☆17

Alternatives and similar repositories for cuMpSGEMM

Users that are interested in cuMpSGEMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RIKEN-RCCS / GEMMul8
View on GitHub
GEMMul8 (GEMMulate): GEMM emulation and its extension to BLAS-like matrix operations using INT8/FP8 matrix engines based on the Ozaki Sch…
☆83Jul 12, 2026Updated last week
enp1s0 / ozIMMU
View on GitHub
FP64 equivalent GEMM by the Ozaki scheme with Int8 Tensor Cores
☆125Dec 2, 2025Updated 7 months ago
wudu98 / autoGEMM
View on GitHub
☆15Dec 5, 2024Updated last year
RIKEN-RCCS / accelerator_for_ozIMMU
View on GitHub
Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.
☆26Dec 10, 2025Updated 7 months ago
Pranavchiku / Gesture-Detection-Application
View on GitHub
Developing multi platform gesture detector application by applying concepts learnt in Embedded Systems course on peripheral devices.
☆21Dec 8, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Tencent / BlazerML-tvm
View on GitHub
Tencent Distribution of TVM
☆16Apr 7, 2023Updated 3 years ago
aiwl / fluids
View on GitHub
Simple "Stable Fluids" Implementation
☆16Mar 25, 2024Updated 2 years ago
flux-framework / Tutorials
View on GitHub
Flux tutorial slides and materials
☆25Updated this week
HuyNguyen-hust / hopper-gemm-101
View on GitHub
☆13Dec 22, 2024Updated last year
FlorianRhiem / VFRendering
View on GitHub
A vector field rendering library
☆17Jul 31, 2019Updated 6 years ago
UoB-HPC / TeaLeaf
View on GitHub
A C++based implementation of the TeaLeaf heat conduction mini-app. This implementation of TeaLeaf replicates the functionality of the ref…
☆25Aug 11, 2024Updated last year
munstermonster / cuSten
View on GitHub
CUDA Finite Difference Library
☆16Aug 21, 2020Updated 5 years ago
Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
lindemeier / PaintMixer
View on GitHub
Digital paint mixing program based on the Kubelka-Munk equations. Implementation of : T. Lindemeier, J. M. Gülzow, and O. Deussen. 2018…
☆14Sep 10, 2020Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
LighthouseHPC / lighthouse
View on GitHub
☆11Apr 10, 2019Updated 7 years ago
pnnl / memgaze
View on GitHub
☆18Jun 16, 2026Updated last month
eth-cscs / COSTA
View on GitHub
Distributed Communication-Optimal Shuffle and Transpose Algorithm
☆14Apr 18, 2026Updated 3 months ago
ohjay / stable_fluids
View on GitHub
Stable, numerical Navier-Stokes solver for use in real-time simulation
☆16Apr 6, 2021Updated 5 years ago
basnijholt / variational-quantum-monte-carlo
View on GitHub
2014: Variational Monte Carlo for the harmonic oscillator, helium, hydrogen and H2 - IPython notebook and FORTRAN90
☆13Jun 23, 2016Updated 10 years ago
PoCInnovation / UberPoC
View on GitHub
An autonomous car able to ride safely.
☆13Jan 31, 2021Updated 5 years ago
RRZE-HPC / GHOST
View on GitHub
General, Hybrid and Optimized Sparse Toolkit (Bitbucket mirror)
☆12Apr 8, 2021Updated 5 years ago
ShaYeBuHui01 / flash_attention_inference
View on GitHub
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆15Aug 31, 2023Updated 2 years ago
thevasudevgupta / gpt-triton
View on GitHub
Triton implementation of GPT/LLAMA
☆22Aug 28, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
pytorch-tpu / llama
View on GitHub
Inference code for LLaMA models
☆20Apr 3, 2025Updated last year
pwrliang / RayJoin
View on GitHub
This repo contains the code of the paper "RayJoin: Fast and Precise Spatial Join", ICS'24
☆12Updated this week
ecrc / polar
View on GitHub
Distributed-memory, double-precision, polar decomposition (QDWH/ZOLO-PD) of a dense matrix, svd (QDWH/ZOLOPD-SVD) of a dense matrix
☆14Jun 3, 2020Updated 6 years ago
HPAC / ELAPS
View on GitHub
Experimental Linear Algebra Performance Studies
☆12Feb 24, 2017Updated 9 years ago
FZJ-JSC / jube-configs
View on GitHub
JUBE benchmarking environment configuration files
☆10Oct 1, 2015Updated 10 years ago
shriramsb / vdnn-plus-plus
View on GitHub
Implementation of vDNN++; an improvement over vDNN
☆18Dec 7, 2018Updated 7 years ago
lindemeier / painty
View on GitHub
C++ library for non photorealistic rendering. Includes a paint renderer based on Kubelka Munk equations, stroke based rendering algorithm…
☆30Nov 22, 2020Updated 5 years ago
noagarcia / ROLL-VideoQA
View on GitHub
PyTorch code for ROLL, a knowledge-based video story question answering model.
☆21Sep 29, 2020Updated 5 years ago
hibagus / CUDA_Bench
View on GitHub
CUDA GPU Benchmark
☆38Jan 31, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
RGivisiez / Heisenberg-SSE
View on GitHub
Stochastic Series Expansion (SSE) for a isotropic S=1/2 antiferromagnetic quantum Heisenberg model in 1D, 2D or 3D lattice . Every lattic…
☆15Jan 23, 2021Updated 5 years ago
forhappy / CS61
View on GitHub
CS61 learning schedules and assessments
☆16Dec 6, 2011Updated 14 years ago
jagot / ThreadedSparseArrays.jl
View on GitHub
☆19Dec 27, 2023Updated 2 years ago
logological / heria
View on GitHub
A LaTeX class for Horizon Europe RIA and IA grant proposals
☆17Aug 17, 2025Updated 11 months ago
ChASE-library / ChASE
View on GitHub
This repository mirrors the principal Gitlab repository of the Chebyshev Accelerated Subspace iteration Eigensolver. If you want to contr…
☆20Jul 8, 2026Updated 2 weeks ago
ian-r-rose / buckinghampy
View on GitHub
Teaching tool for the Buckingham Pi theorem. With a terribly obvious name.
☆17Apr 11, 2018Updated 8 years ago
rmsrosa / UnitfulBuckinghamPi.jl
View on GitHub
Solve for the adimensional Pi groups in a list of Unitful parameters, according to the Buckingham-Pi Theorem.
☆18Mar 4, 2025Updated last year