philipturner / amx-benchmarksLinks
Running linear algebra as fast as possible on Apple silicon
☆21Updated last year
Alternatives and similar repositories for amx-benchmarks
Users that are interested in amx-benchmarks are comparing it to the libraries listed below
Sorting:
- ☆27Updated 4 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆207Updated 6 months ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆132Updated 7 months ago
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆144Updated 2 years ago
- rocWMMA☆121Updated this week
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆35Updated 2 years ago
- Emulating double-precision arithmetic on Apple GPUs☆55Updated 2 years ago
- Scientific computing with Metal in C++: Matrix multiplication example☆36Updated 2 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆110Updated 2 months ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated 3 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆193Updated 9 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆162Updated this week
- Tenstorrent MLIR compiler☆169Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆138Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 4 months ago
- RCCL Performance Benchmark Tests☆71Updated last week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆132Updated 3 weeks ago
- The University of Bristol HPC Simulation Engine☆99Updated 3 weeks ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆40Updated last year
- Trying to figure various CPU things out☆82Updated last year
- ☆148Updated this week
- ☆62Updated 7 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆134Updated last year
- Apple GPU microarchitecture☆540Updated 10 months ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆295Updated this week
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆60Updated 3 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆173Updated last week
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Updated 4 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated last year
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆120Updated 2 years ago