andylolu2 / simpleGEMM

The simplest but fast implementation of matrix multiplication in CUDA.
33Updated 3 months ago

Related projects

Alternatives and complementary repositories for simpleGEMM