ENCCS / gpu-programmingLinks
Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks
☆98Updated last month
Alternatives and similar repositories for gpu-programming
Users that are interested in gpu-programming are comparing it to the libraries listed below
Sorting:
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆372Updated 8 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆55Updated last week
- LLM training in simple, raw C/CUDA☆109Updated last year
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- NVIDIA Math Libraries for the Python Ecosystem☆541Updated last month
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆65Updated 2 months ago
- Custom PTX Instruction Benchmark☆137Updated 10 months ago
- Fast and Furious AMD Kernels☆331Updated 2 weeks ago
- Tensor library & inference framework for machine learning☆118Updated 3 months ago
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆763Updated 3 weeks ago
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆36Updated 2 months ago
- All pdfs of Victor Eijkhout's Art of HPC books and courses☆762Updated last year
- Quantum computing without the linear algebra☆78Updated last month
- The Foundation for All Legate Libraries☆233Updated 2 weeks ago
- HIP Python Low-level Bindings☆33Updated 2 months ago
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆24Updated 2 weeks ago
- C++ HPC Tutorial materials☆54Updated 2 months ago
- Learning about CUDA by writing PTX code.☆151Updated last year
- Visualization of cache-optimized matrix multiplication☆157Updated 9 months ago
- Competitive GPU kernel optimization platform.☆144Updated last week
- Public repository for vol 2 of The Art of HPC: parallel programming☆90Updated 3 months ago
- LLM inference in Fortran☆65Updated last year
- The CUDA target for Numba☆239Updated this week
- monorepo for rocm libraries☆222Updated this week
- Machine Learning for HPC Workflows☆144Updated last month
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆177Updated this week
- Kernel Tuner☆378Updated 3 weeks ago
- Public repository for The Art of HPC volume 1: Scientific Computing☆64Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 11 months ago
- Fast GPT-2 inference written in Fortran☆203Updated 3 months ago