philipturner / amx-benchmarksLinks
Running linear algebra as fast as possible on Apple silicon
☆20Updated last year
Alternatives and similar repositories for amx-benchmarks
Users that are interested in amx-benchmarks are comparing it to the libraries listed below
Sorting:
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆35Updated 2 years ago
- rocWMMA☆115Updated last week
- ☆26Updated 2 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆98Updated last month
- Scientific computing with Metal in C++: Matrix multiplication example☆31Updated 2 years ago
- Exploring the scalable matrix extension of the Apple M4 processor☆180Updated 7 months ago
- Advanced Profiling and Analytics for AMD Hardware☆156Updated this week
- Bandwidth test for ROCm☆58Updated last month
- An HPL-AI implementation for Fugaku☆21Updated 3 years ago
- Tensor Tiling Library☆36Updated 2 months ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆56Updated 2 months ago
- amdgpu example code in hip/asm☆32Updated 2 weeks ago
- ☆62Updated 6 months ago
- Next generation LAPACK implementation for ROCm platform☆103Updated this week
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆55Updated last week
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆40Updated 3 years ago
- An implementation of HIP that works on CPUs, across OSes.☆121Updated last year
- ROCm BLAS marshalling library☆144Updated last week
- RCCL Performance Benchmark Tests☆68Updated last month
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆90Updated this week
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆73Updated 3 months ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated last month
- SYCL Benchmark Suite☆65Updated this week
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆144Updated 2 years ago
- ROC profiler library. Profiling with perf-counters and derived metrics.☆148Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- ☆44Updated 4 years ago
- Compute Benchmarks for oneAPI Level Zero and OpenCL™ Driver☆39Updated last week
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆41Updated last week
- ☆84Updated this week