philipturner / metal-float64
Emulating double-precision arithmetic on Apple GPUs
☆43Updated last year
Related projects: ⓘ
- Running linear algebra as fast as possible on Apple silicon☆18Updated last year
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆33Updated last year
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆120Updated last year
- Print all known information about the GPU on Apple-designed chips☆59Updated 3 weeks ago
- The missing OpenCL 3.0 driver for macOS☆12Updated last year
- Library to manipulate Apple Metal Shading Language IR☆47Updated last year
- Software library for FDTD of viscoelastic equation using a staggered grid arrangement with support for GPU and CPU backends☆52Updated 2 months ago
- An implementation of HIP that works on CPUs, across OSes.☆109Updated 6 months ago
- Tensor Tiling Library☆33Updated 3 weeks ago
- A python library to run metal compute kernels on macOS☆67Updated 10 months ago
- SYCL Conformance Tests☆60Updated last week
- Next generation LAPACK implementation for ROCm platform☆91Updated this week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆89Updated 2 months ago
- Next generation library for iterative sparse solvers for ROCm platform☆74Updated this week
- ☆15Updated last year
- AMD’s C++ library for accelerating tensor primitives☆35Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆39Updated 8 months ago
- Next generation FFT implementation for ROCm☆173Updated this week
- hipFFT is a FFT marshalling library.☆52Updated this week
- Synchronous, single-threaded, library-only SYCL implementation for debugging and verification.☆25Updated this week
- OpenCL/SPIR-V implementation of HIP☆104Updated last year
- Counter-based random number generators for C, C++ and CUDA.☆85Updated 7 months ago
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆78Updated last month
- portFFT is a library implementing Fast Fourier Transforms using SYCL☆14Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆136Updated 3 months ago
- SYCL Open Source Specification☆109Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆84Updated 2 months ago
- Experimental OpenCL SPIR-V to OpenCL C translator☆24Updated last week
- List all available information about all SYCL devices and platforms☆15Updated 4 years ago
- Atomistic Spin Simulation Framework☆63Updated 3 years ago