HMUNACHI / cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

☆180

Alternatives and similar repositories for cuda-repo:

Users that are interested in cuda-repo are comparing it to the libraries listed below

lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆145Updated 9 months ago
rkinas / cuda-learning
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…
☆302Updated last month
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆117Updated 2 months ago
h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆51Updated 7 years ago
drkennetz / cuda_examples
Some CUDA example code with READMEs.
☆90Updated 3 weeks ago
hkproj / triton-flash-attention
☆136Updated 2 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆173Updated 8 months ago
gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆734Updated 2 months ago
gpu-mode / profiling-cuda-in-torch
☆151Updated last year
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆306Updated last week
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆174Updated last year
CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆346Updated 9 months ago
NVIDIA / accelerated-computing-hub
NVIDIA curated collection of educational resources related to general purpose GPU programming.
☆318Updated this week
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆321Updated 9 months ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆65Updated 4 years ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆204Updated 2 months ago
clu0 / unet.cu
UNet diffusion model in pure CUDA
☆600Updated 8 months ago
tgautam03 / xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
☆106Updated 2 months ago
eduardoleao052 / Autograd-from-scratch
Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.
☆111Updated 11 months ago
puttsk / cuda-tutorial
A set of hands-on tutorials for CUDA programming
☆217Updated 11 months ago
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆124Updated last year
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆92Updated 10 months ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆524Updated last month
salykova / matmul.c
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
☆341Updated last month