ENCCS / gpu-programming
Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks
☆74Updated 5 months ago
Alternatives and similar repositories for gpu-programming:
Users that are interested in gpu-programming are comparing it to the libraries listed below
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆44Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆29Updated 3 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆348Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated this week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- GPU documentation for humans☆44Updated this week
- Visualization of cache-optimized matrix multiplication☆120Updated last month
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆30Updated 2 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆187Updated 2 months ago
- N-Ways to Multi-GPU Programming☆21Updated 2 years ago
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆165Updated last week
- Nvidia Instruction Set Specification Generator☆256Updated 9 months ago
- This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!☆70Updated this week
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆144Updated 3 months ago
- NVIDIA Math Libraries for the Python Ecosystem☆297Updated last month
- All pdfs of Victor Eijkhout's Art of HPC books and courses☆627Updated last year
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆205Updated 3 weeks ago
- Advanced Profiling and Analytics for AMD Hardware☆148Updated this week
- NVIDIA tools guide☆129Updated 3 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆171Updated 5 months ago
- Public repository for vol 2 of The Art of HPC: parallel programming☆82Updated 3 weeks ago
- Public repository for The Art of HPC volume 1: Scientific Computing☆57Updated last year
- The CUDA target for Numba☆106Updated this week
- ☆133Updated last year
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆112Updated 3 months ago
- Kernel Tuner☆328Updated last week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆97Updated 9 months ago
- throwaway GPT inference☆138Updated 10 months ago
- Algebraic enhancements for GEMM & AI accelerators☆274Updated last month