ENCCS / gpu-programmingLinks
Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks
☆82Updated last month
Alternatives and similar repositories for gpu-programming
Users that are interested in gpu-programming are comparing it to the libraries listed below
Sorting:
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated last month
- LLM training in simple, raw C/CUDA☆99Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆56Updated last month
- Visualization of cache-optimized matrix multiplication☆147Updated 2 months ago
- All pdfs of Victor Eijkhout's Art of HPC books and courses☆644Updated last year
- NVIDIA tools guide☆133Updated 4 months ago
- GPU documentation for humans☆65Updated 3 weeks ago
- HIP Python Low-level Bindings☆25Updated 2 weeks ago
- Learning about CUDA by writing PTX code.☆131Updated last year
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated last week
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆21Updated this week
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆168Updated 3 weeks ago
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆150Updated 4 months ago
- LLM inference in Fortran☆58Updated last year
- Fast GPT-2 inference written in Fortran☆196Updated 3 weeks ago
- NVIDIA Math Libraries for the Python Ecosystem☆318Updated 2 months ago
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆31Updated last month
- GPUOcelot: A dynamic compilation framework for PTX☆192Updated 3 months ago
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- Public repository for The Art of HPC volume 1: Scientific Computing☆58Updated last year
- Custom PTX Instruction Benchmark☆126Updated 3 months ago
- Advanced Profiling and Analytics for AMD Hardware☆156Updated this week
- Algebraic enhancements for GEMM & AI accelerators☆277Updated 3 months ago
- Public repository for vol 2 of The Art of HPC: parallel programming☆84Updated last month
- Implementation of a parallel least squares support vector machine using multiple backends for different GPU vendors.☆37Updated this week
- ☆54Updated this week
- CUDA Guide☆66Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆176Updated 6 months ago
- Machine Learning with Symbolic Tensors☆278Updated last week
- pytorch from scratch in pure C/CUDA and python☆40Updated 7 months ago