bkvogel / metal_performance_testingLinks
Scientific computing with Metal in C++: Matrix multiplication example
☆29Updated 2 years ago
Alternatives and similar repositories for metal_performance_testing
Users that are interested in metal_performance_testing are comparing it to the libraries listed below
Sorting:
- Metal Shading Language on Apple M1's GPU for scientific C++.☆93Updated last year
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆141Updated 2 years ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆56Updated last month
- Next generation LAPACK implementation for ROCm platform☆101Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆119Updated last week
- AMD’s C++ library for accelerating tensor primitives☆41Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆120Updated this week
- Running linear algebra as fast as possible on Apple silicon☆20Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆70Updated 2 months ago
- ☆29Updated 5 years ago
- Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools☆126Updated 3 weeks ago
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated last week
- CS infrastructure components for HPC applications☆172Updated last week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated last week
- Reusable software components for ROCm developers☆84Updated this week
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆87Updated 3 weeks ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Software library for FDTD of viscoelastic equation using a staggered grid arrangement with support for GPU and CPU backends☆56Updated 2 months ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆108Updated 2 years ago
- RAJA Performance Suite☆117Updated last week
- ☆57Updated 3 weeks ago
- An Adaptive Pencil Decomposition Library for NVIDIA GPUs☆62Updated last month
- ☆38Updated last month
- Emulating double-precision arithmetic on Apple GPUs☆52Updated 2 years ago
- C++ HPC Tutorial materials☆50Updated 10 months ago
- hipFFT is a FFT marshalling library.☆63Updated this week
- Next generation FFT implementation for ROCm☆196Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆95Updated 2 weeks ago
- An implementation of HIP that works on CPUs, across OSes.☆120Updated last year