CNugteren / myGEMMView external linksLinks
Code appendix to an OpenCL matrix-multiplication tutorial
☆179Feb 7, 2017Updated 9 years ago
Alternatives and similar repositories for myGEMM
Users that are interested in myGEMM are comparing it to the libraries listed below
Sorting:
- Tuned OpenCL BLAS☆1,166Feb 1, 2026Updated 2 weeks ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆185Dec 12, 2022Updated 3 years ago
- a software library containing BLAS functions written in OpenCL☆865Aug 2, 2024Updated last year
- A portable high-level API with CUDA or OpenCL back-end☆56Oct 8, 2017Updated 8 years ago
- OpenCL tool to detect buffer overflows in GPU kernels☆22Jan 7, 2019Updated 7 years ago
- Assembler for NVIDIA Maxwell architecture☆1,059Jan 3, 2023Updated 3 years ago
- Open single and half precision gemm implementations☆398Apr 2, 2023Updated 2 years ago
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆22Mar 21, 2016Updated 9 years ago
- profiling gemm on android☆10Apr 1, 2016Updated 9 years ago
- A simple example of using the SDAccel build flow for AWS EC2's F1 instance type. Trys to avoid magic makefiles.☆10Aug 27, 2017Updated 8 years ago
- Sparse Matrix-Vector Multiplication implementations in C☆22Dec 7, 2022Updated 3 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆75Mar 22, 2015Updated 10 years ago
- ☆1,990Jul 29, 2023Updated 2 years ago
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆59Apr 14, 2024Updated last year
- Learn OpenCL step by step.☆138Aug 30, 2022Updated 3 years ago
- OpenCL API, OpenCL C, Extensions, SPIR-V Environment Specs, Ref page, and C++ for OpenCL doc sources.☆403Jan 30, 2026Updated 2 weeks ago
- Benchmark for Co-running Single Applications on Integrated Architectures☆12Jul 7, 2016Updated 9 years ago
- MAFIA: Multiple Application Framework for GPU architectures☆28Jan 21, 2022Updated 4 years ago
- ☆27Oct 26, 2019Updated 6 years ago
- a heterogeneous multiGPU level-3 BLAS library☆46Dec 9, 2019Updated 6 years ago
- An OpenCL device simulator and debugger☆369Feb 6, 2026Updated last week
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Jul 7, 2017Updated 8 years ago
- Yet Another Sokoban Solver and Optimizer - for Android☆16Jan 27, 2026Updated 3 weeks ago
- The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Inte…☆17Mar 28, 2019Updated 6 years ago
- Lecture Slide Issue Tracking☆256May 20, 2018Updated 7 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- Winograd minimal convolution algorithm generator for convolutional neural networks.☆626Feb 9, 2026Updated last week
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 4 years ago
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 6 years ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆15Oct 20, 2021Updated 4 years ago
- BLAS OpenCL implementation.☆16Apr 8, 2015Updated 10 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆255Feb 9, 2026Updated last week
- How to read Lean☆22Jan 30, 2025Updated last year
- Optimizing Mobile Deep Learning on ARM GPU with TVM☆182Oct 15, 2018Updated 7 years ago
- Julia interface to GAlgebra via PyCall☆17Updated this week
- Run OpenCL program on MOBILE GPU (Qualcomm & ARM) !☆19Jun 27, 2018Updated 7 years ago
- A prototype CUDA-to-OpenCL source-to-source translator, built on the Clang compiler framework☆208Jul 12, 2020Updated 5 years ago
- OpenCL library to train deep convolutional neural networks☆879Jan 5, 2018Updated 8 years ago
- ☆37Jan 21, 2018Updated 8 years ago