CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆323Updated 8 months ago
Alternatives and similar repositories for Parallel-Computing-Cuda-C:
Users that are interested in Parallel-Computing-Cuda-C are comparing it to the libraries listed below
- NVIDIA tools guide☆101Updated last month
- Read custom dataset☆11Updated last year
- GPU programming related news and material links☆1,368Updated last month
- Fast CUDA matrix multiplication from scratch☆632Updated last year
- ☆123Updated 6 months ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆699Updated 6 months ago
- UNet diffusion model in pure CUDA☆598Updated 7 months ago
- Examples from Programming in Parallel with CUDA☆122Updated last year
- ☆218Updated last month
- ☆142Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆169Updated last year
- Fastest kernels written from scratch☆170Updated this week
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆63Updated last year
- CUDA Matrix Multiplication Optimization☆161Updated 7 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆269Updated this week
- ☆876Updated last month
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆144Updated 8 months ago
- Step-by-step optimization of CUDA SGEMM☆284Updated 2 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆514Updated this week
- Tutorials on tinygrad☆341Updated last week
- Puzzles for learning Triton☆1,403Updated 3 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆64Updated 4 years ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆212Updated 5 months ago
- High-Performance FP32 Matrix Multiplication on CPU☆333Updated this week
- From zero to hero CUDA for accelerating maths and machine learning on GPU.☆175Updated 6 months ago
- Learnings and programs related to CUDA☆262Updated this week
- An ML Systems Onboarding list☆694Updated 3 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆118Updated last year
- Flash Attention in ~100 lines of CUDA (forward pass only)☆699Updated last month