eth-cscs/Tiled-MM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eth-cscs/Tiled-MM)

eth-cscs / Tiled-MM

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

☆33

Alternatives and similar repositories for Tiled-MM

Users that are interested in Tiled-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

eth-cscs / COSMA
View on GitHub
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆215Apr 18, 2026Updated 3 months ago
eth-cscs / DLA-Future
View on GitHub
DLA-Future
☆85Jun 19, 2026Updated last month
eth-cscs / conflux
View on GitHub
Distributed Communication-Optimal LU-factorization Algorithm
☆12Aug 1, 2021Updated 4 years ago
YdrMaster / cuda-driver
View on GitHub
基于 CUDA Driver API 的 cuda 运行时环境
☆16Jul 30, 2025Updated 11 months ago
solomonik / CANDMC
View on GitHub
Communication Avoiding Numerical Dense Matrix Computations
☆11Dec 20, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lcy-seso / DLFrameworkTest
View on GitHub
My tests and experiments with some popular dl frameworks.
☆17Sep 11, 2025Updated 10 months ago
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
ariasanovsky / ptx-parser
View on GitHub
☆11Jun 9, 2023Updated 3 years ago
gty111 / PTX-EMU
View on GitHub
PTX-EMU is a simple emulator for CUDA program.
☆40Apr 25, 2025Updated last year
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
llnl / Aluminum
View on GitHub
High-performance, GPU-aware communication library
☆90Dec 16, 2025Updated 7 months ago
eth-cscs / pytorch-training
View on GitHub
PyTorch training at CSCS
☆22Jul 4, 2025Updated last year
tonyzhang617 / nomad-dist
View on GitHub
☆40Mar 14, 2024Updated 2 years ago
E3SM-Project / EKAT
View on GitHub
Tools and libraries for writing Kokkos-enabled HPC C++ in E3SM ecosystem
☆22Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / FractalTensor
View on GitHub
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆32Dec 21, 2024Updated last year
pvthinker / wave2d
View on GitHub
A python code to study linear wave dynamics in two-dimensions
☆14Jun 15, 2026Updated last month
KuangjuX / TileGraph
View on GitHub
TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.
☆11Sep 18, 2024Updated last year
eth-cscs / SpFFT
View on GitHub
Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support
☆55Jul 25, 2025Updated 11 months ago
getianao / ngAP
View on GitHub
ngAP's artifact for ASPLOS'24
☆25Jul 29, 2025Updated 11 months ago
omlins / libdiffusion
View on GitHub
Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics
☆24Nov 13, 2020Updated 5 years ago
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
CGCL-codes / streambox
View on GitHub
☆18May 28, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
KuangjuX / cu-x
View on GitHub
🎉My Collections of CUDA Kernels~
☆11Jun 25, 2024Updated 2 years ago
microsoft / BLAS-on-flash
View on GitHub
Linear algebra subroutines for large SSD-resident dense and sparse matrices
☆29Dec 14, 2020Updated 5 years ago
eth-cscs / spla
View on GitHub
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…
☆32Jun 26, 2024Updated 2 years ago
caijixueIT / CUDA_Learning_for_Freshman
View on GitHub
☆14Nov 3, 2025Updated 8 months ago
ROCm / rocHPL
View on GitHub
High Performance Linpack for Next-Generation AMD HPC Accelerators
☆73Apr 21, 2026Updated 3 months ago
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated last month
eth-cscs / spack-batteries-included
View on GitHub
Installing spack without system dependencies
☆24Nov 7, 2022Updated 3 years ago
wmmae / wmma_extension
View on GitHub
An extension library of WMMA API (Tensor Core API)
☆115Jul 12, 2024Updated 2 years ago
stemnic / rustyvisor
View on GitHub
Hypervisor written in Rust for the RISC-V 1.0 hypervisor extension
☆16Oct 21, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
gty111 / SimpleUseGpgpuSim
View on GitHub
GPGPU-SIM 使用篇
☆14Nov 12, 2022Updated 3 years ago
alan-hpc / cuda_op_benchmark
View on GitHub
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆18Jun 13, 2024Updated 2 years ago
pkestene / euler_kokkos
View on GitHub
Compressible hydro and magneto-hydrodynamics (2nd order Godunov) implemented with MPI+Kokkos
☆39Apr 16, 2026Updated 3 months ago
ROCm / roc-stdpar
View on GitHub
☆20Jan 17, 2024Updated 2 years ago
utcs-scea / ava
View on GitHub
Automatic virtualization of (general) accelerators.
☆47Nov 28, 2022Updated 3 years ago
llnl / gtest-mpi-listener
View on GitHub
Header-only plugin for the Google Test framework defining listener(s) emitting sensible output when testing MPI-based, distributed-memory…
☆23Jun 12, 2021Updated 5 years ago
ROCm / rocSOLVER
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆117Jun 8, 2026Updated last month