xrq-phys / blis_apple
BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.
☆32Updated last year
Related projects ⓘ
Alternatives and complementary repositories for blis_apple
- Running linear algebra as fast as possible on Apple silicon☆18Updated last year
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆33Updated 3 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆124Updated last year
- rocWMMA☆92Updated this week
- ☆18Updated 3 years ago
- Emulating double-precision arithmetic on Apple GPUs☆47Updated last year
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆105Updated 3 months ago
- ☆128Updated this week
- The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Inte…☆16Updated 5 years ago
- CuPBoP-AMD is a CUDA translator that translates CUDA programs at NVVM IR level to HIP-compatible IR that can run on AMD GPUs.☆33Updated last year
- Bandwidth test for ROCm☆49Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆75Updated last week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆91Updated 4 months ago
- ROCm BLAS marshalling library☆121Updated this week
- Next generation LAPACK implementation for ROCm platform☆94Updated this week
- Instruction latency & throughput profiler for AArch64☆32Updated 9 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆68Updated 10 months ago
- ☆59Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆100Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- ☆13Updated last month
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆124Updated this week
- Intel® GPU Compute Samples☆97Updated this week
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last month
- Next generation library for iterative sparse solvers for ROCm platform☆76Updated this week
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- Tensor Tiling Library☆33Updated 2 months ago