bkvogel / metal_performance_testing
Scientific computing with Metal in C++: Matrix multiplication example
☆26Updated 2 years ago
Alternatives and similar repositories for metal_performance_testing:
Users that are interested in metal_performance_testing are comparing it to the libraries listed below
- Metal Shading Language on Apple M1's GPU for scientific C++.☆85Updated last year
- Running linear algebra as fast as possible on Apple silicon☆18Updated last year
- oneAPI Math Library (oneMath)☆636Updated last week
- Next generation LAPACK implementation for ROCm platform☆97Updated this week
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆129Updated 2 years ago
- Examples for HIP☆203Updated last month
- Distributed multigrid linear solver library on GPU☆517Updated 5 months ago
- Supernodal sparse direct solver. https://portal.nersc.gov/project/sparse/superlu/☆288Updated last month
- ROCm Parallel Primitives☆168Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated last week
- C++ HPC Tutorial materials☆48Updated 6 months ago
- Portable and vendor neutral framework for parallel programming on heterogeneous platforms.☆406Updated 2 months ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆45Updated 3 months ago
- AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releas…☆210Updated this week
- Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels☆320Updated this week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆93Updated 6 months ago
- Performance-portable library for particle-based simulations☆220Updated last month
- Reference Implementation for stdBLAS☆131Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆157Updated 3 weeks ago
- Next generation library for iterative sparse solvers for ROCm platform☆79Updated this week
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆258Updated 2 weeks ago
- Reusable software components for ROCm developers☆81Updated last week
- Next generation FFT implementation for ROCm☆184Updated last week
- RAJA Performance Suite☆117Updated this week
- RAJA Performance Portability Layer (C++)☆499Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆104Updated last week
- Next generation SPARSE implementation for ROCm platform☆118Updated this week
- CS infrastructure components for HPC applications☆162Updated this week
- SYCL Open Source Specification☆122Updated this week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆206Updated last month