bkvogel / metal_performance_testingLinks
Scientific computing with Metal in C++: Matrix multiplication example
☆43Updated 3 years ago
Alternatives and similar repositories for metal_performance_testing
Users that are interested in metal_performance_testing are comparing it to the libraries listed below
Sorting:
- Metal Shading Language on Apple M1's GPU for scientific C++.☆101Updated 2 years ago
- Study and Implementations of Numerical Algorithms on Apple M1 and A* Devices☆145Updated 2 years ago
- Kernel Tuner☆371Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆119Updated 5 months ago
- Running linear algebra as fast as possible on Apple silicon☆25Updated 2 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆211Updated 8 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 9 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆466Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆136Updated this week
- Examples for HIP☆211Updated 10 months ago
- A C++ wrapper for the Apple metal-cpp library to make it easier to run compute kernels on the GPU☆10Updated 4 months ago
- High-Performance SGEMM on CUDA devices☆107Updated 9 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆359Updated this week
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆90Updated 7 months ago
- Apple GPU microarchitecture☆557Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆254Updated this week
- Emulating double-precision arithmetic on Apple GPUs☆55Updated 2 years ago
- Exploring the scalable matrix extension of the Apple M4 processor☆208Updated 11 months ago
- ☆80Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆198Updated this week
- A python library to run metal compute kernels on macOS☆85Updated 9 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆115Updated this week
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆281Updated 7 months ago
- ☆41Updated 4 years ago
- CUDA Kernel Benchmarking Library☆757Updated 2 weeks ago
- An implementation of HIP that works on CPUs, across OSes.☆127Updated last year
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆98Updated 2 weeks ago
- ☆157Updated last week
- My notes on various HPC papers.☆23Updated 2 years ago
- A profiler to disclose and quantify hardware features on GPUs.☆174Updated 3 years ago