ENCCS / gpu-programming
Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks
☆72Updated 3 months ago
Alternatives and similar repositories for gpu-programming:
Users that are interested in gpu-programming are comparing it to the libraries listed below
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated 2 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆178Updated last month
- N-Ways to Multi-GPU Programming☆18Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- Examples from Programming in Parallel with CUDA☆129Updated last year
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆30Updated 6 months ago
- High-Performance SGEMM on CUDA devices☆86Updated last month
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆95Updated 8 months ago
- SYCL Open Source Specification☆130Updated this week
- Exploring the scalable matrix extension of the Apple M4 processor☆165Updated 4 months ago
- All pdfs of Victor Eijkhout's Art of HPC books and courses☆607Updated 11 months ago
- Advanced Profiling and Analytics for AMD Hardware☆141Updated this week
- Information about many aspects of high-performance computing. Wiki content moved to ~/docs.☆285Updated 3 weeks ago
- ROCm BLAS marshalling library☆133Updated this week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆75Updated last year
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆162Updated last month
- Repository with examples and exercises for OLCF and AMD's HIP training series☆15Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆340Updated 3 weeks ago
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆220Updated this week
- Public repository for The Art of HPC volume 1: Scientific Computing☆54Updated 11 months ago
- This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!☆68Updated this week
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆197Updated 2 years ago
- Bandwidth test for ROCm☆54Updated this week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆214Updated 3 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆84Updated 2 weeks ago
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆110Updated 2 months ago
- Nvidia Instruction Set Specification Generator☆254Updated 8 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆200Updated 3 months ago
- Next generation LAPACK implementation for ROCm platform☆99Updated this week