rkinas / cuda-learning
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or looking to optimize and scale your GPU-accelerated applications.
☆172Updated this week
Alternatives and similar repositories for cuda-learning:
Users that are interested in cuda-learning are comparing it to the libraries listed below
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆171Updated this week
- ☆208Updated last week
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆196Updated 3 weeks ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆244Updated 2 months ago
- High Quality Resources on GPU Programming/Architecture☆578Updated 6 months ago
- The Tensor (or Array)☆420Updated 5 months ago
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆167Updated 5 months ago
- Learnings and programs related to CUDA☆115Updated last week
- UNet diffusion model in pure CUDA☆596Updated 7 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆167Updated last year
- ☆140Updated 11 months ago
- An ML Systems Onboarding list☆664Updated this week
- CUDA Learning guide☆300Updated 7 months ago
- The Multilayer Perceptron Language Model☆533Updated 5 months ago
- ☆110Updated 3 weeks ago
- The Autograd Engine☆555Updated 4 months ago
- pytorch from scratch in pure C/CUDA and python☆39Updated 3 months ago
- Accelerated General (FP32) Matrix Multiplication☆90Updated 3 weeks ago
- Slides, notes, and materials for the workshop☆310Updated 7 months ago
- NVIDIA tools guide☆98Updated 3 weeks ago
- learningggggggg 🐳☆146Updated 3 weeks ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆122Updated 2 months ago
- Alex Krizhevsky's original code from Google Code☆190Updated 8 years ago
- Implementation of Diffusion Transformer (DiT) in JAX☆261Updated 7 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆515Updated this week
- ☆820Updated 3 weeks ago