CNugteren/myGEMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CNugteren/myGEMM)

CNugteren / myGEMM

Code appendix to an OpenCL matrix-multiplication tutorial

☆179

Alternatives and similar repositories for myGEMM

Users that are interested in myGEMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CNugteren / CLBlast
View on GitHub
Tuned OpenCL BLAS
☆1,185Apr 13, 2026Updated 3 months ago
CNugteren / CLTune
View on GitHub
CLTune: An automatic OpenCL & CUDA kernel tuner
☆186Dec 12, 2022Updated 3 years ago
clMathLibraries / clBLAS
View on GitHub
a software library containing BLAS functions written in OpenCL
☆866Aug 2, 2024Updated last year
CNugteren / CLCudaAPI
View on GitHub
A portable high-level API with CUDA or OpenCL back-end
☆56Oct 8, 2017Updated 8 years ago
huytd / opencl-benchmark
View on GitHub
Sample program to compare calculation performance between CPU and GPU
☆16Oct 27, 2016Updated 9 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,074Jan 3, 2023Updated 3 years ago
swenson / aesrng
View on GitHub
AES-based random number generator in C
☆11Apr 27, 2015Updated 11 years ago
krrishnarraj / clpeak
View on GitHub
A synthetic micro-benchmark that measures peak compute, bandwidth, and matrix throughput of GPUs and CPUs
☆506Jul 21, 2026Updated last week
openai / openai-gemm
View on GitHub
Open single and half precision gemm implementations
☆396Apr 2, 2023Updated 3 years ago
BenjaminW3 / matmul
View on GitHub
Sequential and parallel GEMM implementations with C interface + Benchmark.
☆12May 24, 2016Updated 10 years ago
ysh329 / OpenCL-101
View on GitHub
Learn OpenCL step by step.
☆140Aug 30, 2022Updated 3 years ago
flame / how-to-optimize-gemm
View on GitHub
☆2,025Jul 29, 2023Updated 3 years ago
rohithj / Xeon-CafPhi
View on GitHub
Caffe deep learning framework - optimized for Xeon Phi
☆14May 12, 2015Updated 11 years ago
Sable / fait-maison-spmv
View on GitHub
Sparse Matrix-Vector Multiplication implementations in C
☆22Dec 7, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hyqneuron / asfermi
View on GitHub
assembler for NVIDIA FERMI. Imported from Google Code
☆77Mar 22, 2015Updated 11 years ago
GPUPeople / spECK
View on GitHub
Efficient SpGEMM on GPU using CUDA and CSR
☆61Jul 18, 2023Updated 3 years ago
vetter / shoc
View on GitHub
The SHOC Benchmark Suite
☆262Oct 6, 2025Updated 9 months ago
csehydrogen / Winograd-OpenCL
View on GitHub
Winograd-based convolution implementation in OpenCL
☆29Jan 22, 2017Updated 9 years ago
pigirons / spmv
View on GitHub
This is a tuned sparse matrix dense vector multiplication(SpMV) library
☆23Mar 21, 2016Updated 10 years ago
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
yzhaiustc / Optimizing-SGEMV-on-NVIDIA-GPUs
View on GitHub
An implementation of SGEMV with performance comparable to cuBLAS.
☆12May 21, 2021Updated 5 years ago
iamalbert / torch-word-emb
View on GitHub
load word embeddings to Torch.Tensor
☆14May 12, 2016Updated 10 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
PeterTh / uCLbench
View on GitHub
Set of OpenCL microbenchmarks
☆29Nov 19, 2025Updated 8 months ago
fengzhangcs / CoRunBench
View on GitHub
Benchmark for Co-running Single Applications on Integrated Architectures
☆12Jul 7, 2016Updated 10 years ago
thoppe / Cayley-Dickson
View on GitHub
Cayley Dickson algebra implementation in python
☆13Jan 3, 2019Updated 7 years ago
HandsOnOpenCL / Lecture-Slides
View on GitHub
Lecture Slide Issue Tracking
☆258May 20, 2018Updated 8 years ago
nerdralph / cl-mem
View on GitHub
OpenCL memory benchmark
☆15Dec 21, 2016Updated 9 years ago
adwaitjog / mafia
View on GitHub
MAFIA: Multiple Application Framework for GPU architectures
☆28Jan 21, 2022Updated 4 years ago
linnanwang / BLASX
View on GitHub
a heterogeneous multiGPU level-3 BLAS library
☆46Dec 9, 2019Updated 6 years ago
jrprice / Oclgrind
View on GitHub
An OpenCL device simulator and debugger
☆373Mar 24, 2026Updated 4 months ago
vtsynergy / OpenDwarfs
View on GitHub
The OpenDwarfs project provides a benchmark suite consisting of different computation/communication idioms, i.e., dwarfs, for state-of-ar…
☆102Sep 18, 2019Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
yywyz / OpenCL-Programming-Examples
View on GitHub
OpenCL Programming Examples
☆22Jul 21, 2018Updated 8 years ago
MegEngine / MegPeak
View on GitHub
☆256Sep 15, 2023Updated 2 years ago
leonardt / hwtypes
View on GitHub
Python implementations of fixed size hardware types (Bit, BitVector, UInt, SInt, ...) based on the SMT-LIB2 semantics
☆18Sep 13, 2023Updated 2 years ago
lunochod / caffe
View on GitHub
Caffe: a fast open framework for deep learning.
☆14Aug 26, 2015Updated 10 years ago
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆744May 14, 2026Updated 2 months ago
andravin / wincnn
View on GitHub
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆628Feb 9, 2026Updated 5 months ago
weifengliu-ssslab / Benchmark_SpMV_using_CSR5
View on GitHub
CSR5-based SpMV on CPUs, GPUs and Xeon Phi
☆111Jun 10, 2024Updated 2 years ago