AdepojuJeremy / CUDA-120-DAYS--CHALLENGELinks
A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU parallel programming, memory management, and performance optimization skills.
☆803Updated 7 months ago
Alternatives and similar repositories for CUDA-120-DAYS--CHALLENGE
Users that are interested in CUDA-120-DAYS--CHALLENGE are comparing it to the libraries listed below
Sorting:
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆414Updated 8 months ago
- Learnings and programs related to CUDA☆426Updated 4 months ago
- ☆397Updated 7 months ago
- 100 days of building GPU kernels!☆535Updated 6 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆571Updated 5 months ago
- ☆393Updated 2 months ago
- CUDA Learning guide☆477Updated last year
- GPU Kernels☆208Updated 6 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆432Updated 8 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆656Updated this week
- An ML Systems Onboarding list☆935Updated 9 months ago
- Apply GPU in ML and DL☆54Updated 2 months ago
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆228Updated 10 months ago
- ☆2,053Updated 2 weeks ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆277Updated 11 months ago
- ☆95Updated 3 weeks ago
- learningggggggg 🐳☆553Updated 7 months ago
- Some CUDA example code with READMEs.☆178Updated last week
- High Quality Resources on GPU Programming/Architecture☆590Updated last year
- GPU programming related news and material links☆1,787Updated 2 months ago
- UNet diffusion model in pure CUDA☆653Updated last year
- Simple MPI implementation for prototyping or learning☆288Updated 3 months ago
- Canny edge detector implemented in CUDA C/C++☆27Updated 9 months ago
- Learning about CUDA by writing PTX code.☆147Updated last year
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆166Updated 10 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆196Updated 5 months ago
- It is said that, Ilya Sutskever gave John Carmack this reading list of ~ 30 research papers on deep learning.☆932Updated last year
- creating a tiny tensor library in raw C☆831Updated 8 months ago
- Visualization of cache-optimized matrix multiplication☆155Updated 8 months ago
- CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.☆200Updated 5 months ago