bkvogel / metal_performance_testingLinks
Scientific computing with Metal in C++: Matrix multiplication example
☆45Updated 3 years ago
Alternatives and similar repositories for metal_performance_testing
Users that are interested in metal_performance_testing are comparing it to the libraries listed below
Sorting:
- Metal Shading Language on Apple M1's GPU for scientific C++.☆104Updated 2 years ago
- Running linear algebra as fast as possible on Apple silicon☆27Updated 2 years ago
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆149Updated 2 years ago
- Kernel Tuner☆372Updated 2 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆216Updated 9 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆137Updated last week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆121Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆198Updated last week
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆92Updated 8 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆115Updated last week
- ☆41Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆255Updated this week
- Examples for HIP☆212Updated 11 months ago
- Emulating double-precision arithmetic on Apple GPUs☆55Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆178Updated this week
- High-Performance SGEMM on CUDA devices☆112Updated 10 months ago
- Software library for FDTD of viscoelastic equation using a staggered grid arrangement with support for GPU and CPU backends☆56Updated last week
- Online CUDA Occupancy Calculator☆80Updated 4 years ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆155Updated 3 years ago
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆282Updated 8 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 10 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆132Updated last week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆108Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆135Updated 2 years ago
- ☆85Updated 3 weeks ago
- AMD’s C++ library for accelerating tensor primitives☆46Updated this week
- collection of benchmarks to measure basic GPU capabilities☆459Updated last month
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆389Updated this week
- ☆157Updated this week