cfregly / ai-performance-engineeringLinks
☆878Updated last week
Alternatives and similar repositories for ai-performance-engineering
Users that are interested in ai-performance-engineering are comparing it to the libraries listed below
Sorting:
- Slides, notes, and materials for the workshop☆337Updated last year
- An ML Systems Onboarding list☆964Updated 11 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆451Updated 10 months ago
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆431Updated 10 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆630Updated 6 months ago
- Some CUDA example code with READMEs.☆179Updated last month
- ☆408Updated 9 months ago
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆1,056Updated 3 weeks ago
- Learn CUDA with PyTorch☆168Updated 2 weeks ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆246Updated 8 months ago
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆801Updated this week
- 100 days of building GPU kernels!☆560Updated 8 months ago
- GPU Kernels☆217Updated 8 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- ☆208Updated last year
- Simple MPI implementation for prototyping or learning☆297Updated 5 months ago
- GPU documentation for humans☆478Updated last month
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆546Updated 3 months ago
- Learnings and programs related to CUDA☆432Updated 6 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆182Updated 2 weeks ago
- Apply GPU in ML and DL☆55Updated 3 months ago
- ☆233Updated last year
- ☆551Updated last year
- Helpful kernel tutorials and examples for tile-based GPU programming☆554Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆467Updated 2 weeks ago
- Perplexity GPU Kernels☆548Updated 2 months ago
- ☆78Updated 2 years ago
- GPU programming related news and material links☆1,886Updated 3 months ago
- Where GPUs get cooked 👩🍳🔥☆345Updated 3 months ago
- Contains hands-on example code for [O'reilly book "Deep Learning At Scale"](https://www.oreilly.com/library/view/deep-learning-at/9781098…☆31Updated last year