gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆547Updated last week
Related projects ⓘ
Alternatives and complementary repositories for awesomeMLSys
- GPU programming related news and material links☆1,242Updated last month
- Puzzles for learning Triton☆1,138Updated this week
- Slides, notes, and materials for the workshop☆306Updated 5 months ago
- UNet diffusion model in pure CUDA☆584Updated 4 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆717Updated last month
- Building blocks for foundation models.☆397Updated 10 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- High Quality Resources on GPU Programming/Architecture☆566Updated 3 months ago
- The Tensor (or Array)☆411Updated 3 months ago
- Alex Krizhevsky's original code from Google Code☆190Updated 8 years ago
- What would you do with 1000 H100s...☆910Updated 10 months ago
- NanoGPT (124M) in 5 minutes☆1,269Updated this week
- LLM papers I'm reading, mostly on inference and model compression☆694Updated 11 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- Best practices & guides on how to write distributed pytorch training code☆288Updated this week
- The Multilayer Perceptron Language Model☆523Updated 3 months ago
- The Autograd Engine☆535Updated 2 months ago
- ☆563Updated 3 weeks ago
- Material for gpu-mode lectures☆3,028Updated this week
- ☆133Updated 9 months ago
- Tile primitives for speedy kernels☆1,661Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆107Updated last year
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆170Updated last month
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆167Updated 3 months ago
- A bibliography and survey of the papers surrounding o1☆780Updated last week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆626Updated 7 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆175Updated this week
- ☆391Updated last month
- System 2 Reasoning Link Collection☆694Updated 3 weeks ago
- ☆153Updated this week