debowin / cuda-tiled-matrix-multiplication

Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memory within GPU thread blocks.
16Updated 7 years ago

Alternatives and similar repositories for cuda-tiled-matrix-multiplication:

Users that are interested in cuda-tiled-matrix-multiplication are comparing it to the libraries listed below