ENCCS / gpu-programmingLinks
Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks
☆86Updated last month
Alternatives and similar repositories for gpu-programming
Users that are interested in gpu-programming are comparing it to the libraries listed below
Sorting:
- LLM training in simple, raw C/CUDA☆99Updated last year
- Custom PTX Instruction Benchmark☆126Updated 3 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆352Updated 2 months ago
- Tensor library & inference framework for machine learning☆97Updated this week
- High-Performance SGEMM on CUDA devices☆95Updated 5 months ago
- GPU documentation for humans☆70Updated last week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆56Updated 2 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated this week
- All pdfs of Victor Eijkhout's Art of HPC books and courses☆659Updated last year
- Learning about CUDA by writing PTX code.☆132Updated last year
- Quantum computing without the linear algebra☆64Updated last week
- NVIDIA tools guide☆135Updated 5 months ago
- NVIDIA Math Libraries for the Python Ecosystem☆330Updated 2 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆195Updated 4 months ago
- Public repository for vol 2 of The Art of HPC: parallel programming☆86Updated 2 weeks ago
- Visualization of cache-optimized matrix multiplication☆149Updated 3 months ago
- LLM inference in Fortran☆59Updated last year
- Public repository for The Art of HPC volume 1: Scientific Computing☆59Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆18Updated 5 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆50Updated last week
- N-Ways to Multi-GPU Programming☆34Updated 2 years ago
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆22Updated last week
- Nvidia Instruction Set Specification Generator☆278Updated 11 months ago
- ☆134Updated 2 years ago
- Examples from Programming in Parallel with CUDA☆153Updated 2 years ago
- Exploring the scalable matrix extension of the Apple M4 processor☆180Updated 7 months ago
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆31Updated 2 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆106Updated this week
- Reference Kernels for the Leaderboard☆60Updated last week
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆169Updated last week