xrq-phys / blis_apple
BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.
☆33Updated 2 years ago
Alternatives and similar repositories for blis_apple:
Users that are interested in blis_apple are comparing it to the libraries listed below
- Running linear algebra as fast as possible on Apple silicon☆18Updated last year
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆36Updated 3 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆54Updated 4 months ago
- ☆17Updated 3 months ago
- ☆20Updated 3 years ago
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆128Updated 2 years ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆112Updated 2 weeks ago
- rocWMMA☆97Updated this week
- Next generation LAPACK implementation for ROCm platform☆97Updated this week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆93Updated 6 months ago
- ROCm BLAS marshalling library☆125Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆78Updated this week
- ☆51Updated 5 years ago
- ☆59Updated last month
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆126Updated this week
- nvptx-tools: a collection of tools for use with nvptx-none GCC toolchains.☆49Updated 4 months ago
- ROCm Device Libraries☆98Updated 8 months ago
- ☆75Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆157Updated 3 weeks ago
- ☆131Updated this week
- SYCL Reference Manual☆27Updated 8 months ago
- ROCm's Thunk Interface☆84Updated last month
- assembler for NVIDIA FERMI. Imported from Google Code☆71Updated 9 years ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆96Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆71Updated last year
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Updated 9 months ago
- ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime☆234Updated this week
- hipFFT is a FFT marshalling library.☆57Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆47Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year