HMUNACHI / henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

☆181

Alternatives and similar repositories for henry-vjp:

Users that are interested in henry-vjp are comparing it to the libraries listed below

gpu-mode / profiling-cuda-in-torch
☆152Updated last year
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆154Updated last week
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆119Updated 2 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆177Updated 8 months ago
lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆146Updated 9 months ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆66Updated 4 years ago
rkinas / cuda-learning
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…
☆308Updated last month
hkproj / triton-flash-attention
☆142Updated 2 months ago
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆318Updated 3 weeks ago
CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆349Updated 9 months ago
drkennetz / cuda_examples
Some CUDA example code with READMEs.
☆93Updated 3 weeks ago
a-hamdi / GPU
100 days of building GPU kernels!
☆321Updated this week
SzymonOzog / GPU_Programming
☆47Updated this week
clu0 / unet.cu
UNet diffusion model in pure CUDA
☆600Updated 9 months ago
gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆743Updated 2 months ago
andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆274Updated 2 months ago
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆125Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆212Updated 2 months ago
CisMine / GPU-in-ML-DL
Apply GPU in ML and DL
☆48Updated last month
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆169Updated last week
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆87Updated 2 months ago
mit-han-lab / parallel-computing-tutorial
☆160Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆524Updated last month
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆177Updated last year
h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆53Updated 7 years ago
genaibook / genaibook
Contains the public resources of Hands on GenAI book
☆120Updated 2 months ago
salykova / matmul.c
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
☆343Updated last month
Deep-Learning-Profiling-Tools / triton-viz
☆192Updated this week
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆321Updated 10 months ago
mikeroyal / CUDA-Guide
CUDA Guide
☆63Updated last year