bkvogel / metal_performance_testing
Scientific computing with Metal in C++: Matrix multiplication example
☆29Updated 2 years ago
Alternatives and similar repositories for metal_performance_testing:
Users that are interested in metal_performance_testing are comparing it to the libraries listed below
- Metal Shading Language on Apple M1's GPU for scientific C++.☆93Updated last year
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆139Updated 2 years ago
- Running linear algebra as fast as possible on Apple silicon☆20Updated last year
- Examples for HIP☆205Updated 5 months ago
- Next generation FFT implementation for ROCm☆191Updated this week
- Next generation LAPACK implementation for ROCm platform☆100Updated last week
- Emulating double-precision arithmetic on Apple GPUs☆49Updated last year
- rocWMMA☆110Updated last week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated last week
- Reusable software components for ROCm developers☆83Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated last week
- An implementation of HIP that works on CPUs, across OSes.☆116Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆54Updated 2 weeks ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆60Updated last month
- hipFFT is a FFT marshalling library.☆63Updated last week
- Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools☆122Updated last week
- A python library to run metal compute kernels on macOS☆77Updated 3 months ago
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- ROCm Parallel Primitives☆171Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆90Updated last month
- ROCm Systems Profiler☆17Updated this week
- C++ HPC Tutorial materials☆49Updated 9 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆44Updated this week
- ROCm SPARSE marshalling library☆67Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- RAND library for HIP programming language☆118Updated last week
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆108Updated last year
- Next generation SPARSE implementation for ROCm platform☆122Updated this week
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆87Updated 2 weeks ago