mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆309Updated 7 months ago
Alternatives and similar repositories for gpu-optimization-workshop:
Users that are interested in gpu-optimization-workshop are comparing it to the libraries listed below
- An ML Systems Onboarding list☆647Updated 2 months ago
- UNet diffusion model in pure CUDA☆596Updated 6 months ago
- Best practices & guides on how to write distributed pytorch training code☆336Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆161Updated this week
- Building blocks for foundation models.☆435Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆644Updated this week
- End-to-End LLM Guide☆99Updated 6 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆505Updated 2 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆114Updated last year
- ☆138Updated 11 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆752Updated this week
- PyTorch per step fault tolerance (actively under development)☆220Updated this week
- ☆91Updated 2 weeks ago
- Puzzles for learning Triton☆1,300Updated last month
- ☆267Updated 6 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆253Updated last year
- ☆67Updated 6 months ago
- LLM KV cache compression made easy☆303Updated this week
- ☆170Updated this week
- What would you do with 1000 H100s...☆948Updated last year
- The Tensor (or Array)☆418Updated 5 months ago
- GPU programming related news and material links☆1,312Updated last week
- Scalable and Performant Data Loading☆207Updated this week
- Cataloging released Triton kernels.☆155Updated last week
- For optimization algorithm research and development.☆484Updated this week
- Alex Krizhevsky's original code from Google Code☆190Updated 8 years ago
- Applied AI experiments and examples for PyTorch☆211Updated this week
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆167Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆182Updated this week
- Transform datasets at scale. Optimize datasets for fast AI model training.☆396Updated this week