Code appendix to an OpenCL matrix-multiplication tutorial
☆179Feb 7, 2017Updated 9 years ago
Alternatives and similar repositories for myGEMM
Users that are interested in myGEMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tuned OpenCL BLAS☆1,179Apr 13, 2026Updated 2 months ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆185Dec 12, 2022Updated 3 years ago
- a software library containing BLAS functions written in OpenCL☆864Aug 2, 2024Updated last year
- A portable high-level API with CUDA or OpenCL back-end☆56Oct 8, 2017Updated 8 years ago
- Sample program to compare calculation performance between CPU and GPU☆16Oct 27, 2016Updated 9 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Assembler for NVIDIA Maxwell architecture☆1,070Jan 3, 2023Updated 3 years ago
- A synthetic micro-benchmark that measures peak compute, bandwidth, and matrix throughput of GPUs and CPUs☆498Updated this week
- The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Inte…☆17Mar 28, 2019Updated 7 years ago
- Open single and half precision gemm implementations☆397Apr 2, 2023Updated 3 years ago
- OpenCL tool to detect buffer overflows in GPU kernels☆23Jan 7, 2019Updated 7 years ago
- Sequential and parallel GEMM implementations with C interface + Benchmark.☆12May 24, 2016Updated 10 years ago
- Learn OpenCL step by step.☆140Aug 30, 2022Updated 3 years ago
- ☆2,015Jul 29, 2023Updated 2 years ago
- Caffe deep learning framework - optimized for Xeon Phi☆14May 12, 2015Updated 11 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- FFI for OpenCL☆12Dec 19, 2015Updated 10 years ago
- Sparse Matrix-Vector Multiplication implementations in C☆22Dec 7, 2022Updated 3 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆77Mar 22, 2015Updated 11 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆61Jul 18, 2023Updated 2 years ago
- Materials for workshop on GPU computation for statistics, data science, machine learning applications.☆14Sep 8, 2016Updated 9 years ago
- Winograd-based convolution implementation in OpenCL☆29Jan 22, 2017Updated 9 years ago
- A simple example of using the SDAccel build flow for AWS EC2's F1 instance type. Trys to avoid magic makefiles.☆10Aug 27, 2017Updated 8 years ago
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆23Mar 21, 2016Updated 10 years ago
- ☆27Oct 26, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 7 years ago
- OpenCL API, OpenCL C, Extensions, SPIR-V Environment Specs, Ref page, and C++ for OpenCL doc sources.☆413Jun 3, 2026Updated 2 weeks ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆17Oct 20, 2021Updated 4 years ago
- load word embeddings to Torch.Tensor☆14May 12, 2016Updated 10 years ago
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 5 years ago
- Benchmark for Co-running Single Applications on Integrated Architectures☆12Jul 7, 2016Updated 9 years ago
- Cayley Dickson algebra implementation in python☆12Jan 3, 2019Updated 7 years ago
- a heterogeneous multiGPU level-3 BLAS library☆46Dec 9, 2019Updated 6 years ago
- Lecture Slide Issue Tracking☆258May 20, 2018Updated 8 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- OpenCL memory benchmark☆15Dec 21, 2016Updated 9 years ago
- MAFIA: Multiple Application Framework for GPU architectures☆28Jan 21, 2022Updated 4 years ago
- Caffe: a fast open framework for deep learning.☆14Aug 26, 2015Updated 10 years ago
- An OpenCL device simulator and debugger☆372Mar 24, 2026Updated 2 months ago
- OpenCL Programming Examples☆22Jul 21, 2018Updated 7 years ago
- The OpenDwarfs project provides a benchmark suite consisting of different computation/communication idioms, i.e., dwarfs, for state-of-ar…☆102Sep 18, 2019Updated 6 years ago
- ☆256Sep 15, 2023Updated 2 years ago