BLAS OpenCL implementation.
☆16Apr 8, 2015Updated 10 years ago
Alternatives and similar repositories for clOpenBLAS
Users that are interested in clOpenBLAS are comparing it to the libraries listed below
Sorting:
- flexible-gemm conv of deepcore☆17Dec 2, 2019Updated 6 years ago
- The demo projects for Allwinner D1 SBC☆24Sep 7, 2021Updated 4 years ago
- profiling gemm on android☆10Apr 1, 2016Updated 9 years ago
- Build-to-Order BLAS☆12Apr 9, 2019Updated 6 years ago
- A portable high-level API with CUDA or OpenCL back-end☆56Oct 8, 2017Updated 8 years ago
- ☆10Jul 22, 2023Updated 2 years ago
- General Stride K-Nearest Neighbors☆14Jun 15, 2021Updated 4 years ago
- Fast parallel CTC.☆10Apr 7, 2017Updated 8 years ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆32Jun 25, 2025Updated 8 months ago
- Memory System Microbenchmarks☆65Feb 9, 2023Updated 3 years ago
- D BLAS header. Works with OpenBLAS.☆13Mar 20, 2023Updated 3 years ago
- Cute layout visualization☆33Jan 18, 2026Updated 2 months ago
- A managed platform and language for GPGPU☆32Dec 3, 2012Updated 13 years ago
- Strassen's Algorithm for Tensor Contraction☆15Jul 7, 2017Updated 8 years ago
- Fast SIMD alpha overlay and blending for Raspberry Pi and other ARM systems.☆23Jul 26, 2020Updated 5 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Jul 7, 2017Updated 8 years ago
- ☆73Mar 4, 2012Updated 14 years ago
- Heterogeneous Run Time version of Caffe. Added heterogeneous capabilities to the Caffe, uses heterogeneous computing infrastructure frame…☆269Oct 16, 2018Updated 7 years ago
- Parallel implementation of k-means clustering using MPI4PY and PyCUDA.☆10Mar 11, 2019Updated 7 years ago
- ☆11Sep 21, 2018Updated 7 years ago
- A DMD-like wrapper for GDC.☆21Dec 29, 2025Updated 2 months ago
- ☆13Nov 15, 2022Updated 3 years ago
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Dec 4, 2023Updated 2 years ago
- ☆55Nov 21, 2019Updated 6 years ago
- Xception V1 model in Tensorflow with pretrained weights on ImageNet☆13Apr 9, 2018Updated 7 years ago
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 6 years ago
- A simple Rust crate to cache data both in-memory and on disk☆11Dec 26, 2021Updated 4 years ago
- Multi-GPU CUDA based scheduler.☆13Jul 20, 2017Updated 8 years ago
- A cross-platform desktop/mobile UI engine written in D using dsfml☆13Dec 10, 2016Updated 9 years ago
- My LeetCode Solutions in Java☆25Oct 25, 2014Updated 11 years ago
- old - now lives in https://github.com/concourse/concourse☆11Mar 24, 2022Updated 3 years ago
- Instructions and templates for SC authors☆17Aug 22, 2021Updated 4 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- USB 2.0 data types☆13Mar 16, 2021Updated 5 years ago
- A simple integrated container orchestration solution☆13Dec 2, 2022Updated 3 years ago
- Automated machine learning as an AI-HPC benchmark☆65Jul 19, 2022Updated 3 years ago
- Sparse matrix-matrix multiplication on CPU+GPU systems.☆13Mar 17, 2014Updated 12 years ago
- A small, personal PaaS☆15Apr 11, 2024Updated last year
- maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas☆17Dec 22, 2018Updated 7 years ago