AdepojuJeremy / CUDA-120-DAYS--CHALLENGELinks
A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU parallel programming, memory management, and performance optimization skills.
☆683Updated 2 months ago
Alternatives and similar repositories for CUDA-120-DAYS--CHALLENGE
Users that are interested in CUDA-120-DAYS--CHALLENGE are comparing it to the libraries listed below
Sorting:
- Learnings and programs related to CUDA☆402Updated 3 months ago
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆348Updated 3 months ago
- ☆328Updated last month
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆357Updated 2 months ago
- 100 days of building GPU kernels!☆430Updated last month
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆268Updated 6 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆553Updated this week
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa …☆218Updated 5 months ago
- ☆255Updated 4 months ago
- An ML Systems Onboarding list☆794Updated 4 months ago
- learningggggggg 🐳☆520Updated 2 months ago
- GPU Kernels☆178Updated last month
- High Quality Resources on GPU Programming/Architecture☆587Updated 10 months ago
- ☆1,148Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆184Updated last week
- CUDA Learning guide☆382Updated 11 months ago
- Assignments of courses taught at IISC as part of MTech AI curriculum☆116Updated 3 months ago
- UNet diffusion model in pure CUDA☆606Updated 11 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated last month
- Tutorials on tinygrad☆379Updated 3 weeks ago
- a tiny multidimensional array implementation in C similar to numpy, but only one file.☆228Updated 10 months ago
- Visualization of cache-optimized matrix multiplication☆147Updated 2 months ago
- a simple CLI command that will create a template of a generic ML Project☆80Updated 7 months ago
- The Tensor (or Array)☆433Updated 9 months ago
- Learning about CUDA by writing PTX code.☆131Updated last year
- Apply GPU in ML and DL☆52Updated 3 months ago
- CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.☆183Updated last month
- ☆160Updated 2 weeks ago
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆173Updated 10 months ago
- GPU programming related news and material links☆1,540Updated 4 months ago