gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆664Updated this week
Alternatives and similar repositories for awesomeMLSys:
Users that are interested in awesomeMLSys are comparing it to the libraries listed below
- GPU programming related news and material links☆1,347Updated 3 weeks ago
- Puzzles for learning Triton☆1,337Updated 2 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆670Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆171Updated this week
- High Quality Resources on GPU Programming/Architecture☆578Updated 6 months ago
- Building blocks for foundation models.☆440Updated last year
- UNet diffusion model in pure CUDA☆596Updated 7 months ago
- My learning notes/codes for ML SYS.☆477Updated last week
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆758Updated last week
- What would you do with 1000 H100s...☆970Updated last year
- Material for gpu-mode lectures☆3,567Updated 3 weeks ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆685Updated last month
- Fast CUDA matrix multiplication from scratch☆599Updated last year
- ☆110Updated 3 weeks ago
- Slides, notes, and materials for the workshop☆310Updated 7 months ago
- The Multilayer Perceptron Language Model☆533Updated 5 months ago
- ☆140Updated 11 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆244Updated 2 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆511Updated this week
- The Tensor (or Array)☆420Updated 5 months ago
- Best practices & guides on how to write distributed pytorch training code☆342Updated this week
- Learnings and programs related to CUDA☆115Updated last week
- FlashInfer: Kernel Library for LLM Serving☆1,876Updated this week
- Alex Krizhevsky's original code from Google Code☆190Updated 8 years ago
- Efficient LLM Inference over Long Sequences☆349Updated last month
- From zero to hero CUDA for accelerating maths and machine learning on GPU.☆175Updated 6 months ago
- Tile primitives for speedy kernels☆1,966Updated this week
- The Autograd Engine☆555Updated 4 months ago
- LLM papers I'm reading, mostly on inference and model compression☆707Updated last year