Slides, notes, and materials for the workshop
☆339Jun 1, 2024Updated last year
Alternatives and similar repositories for gpu-optimization-workshop
Users that are interested in gpu-optimization-workshop are comparing it to the libraries listed below
Sorting:
- Slides and recordings of talks hosted by our community☆21Jun 21, 2024Updated last year
- GPU programming related news and material links☆1,997Sep 17, 2025Updated 5 months ago
- Material for gpu-mode lectures☆5,773Feb 1, 2026Updated last month
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- PyTorch native quantization and sparsity for training and inference☆2,707Updated this week
- An ML Systems Onboarding list☆994Feb 19, 2026Updated last week
- ☆91Feb 29, 2024Updated 2 years ago
- Fine-tune an LLM to perform batch inference and online serving.☆121May 29, 2025Updated 9 months ago
- Cataloging released Triton kernels.☆295Sep 9, 2025Updated 5 months ago
- Puzzles for learning Triton☆2,314Nov 18, 2024Updated last year
- Unconditional music synthesis using a diffusion model in the STFT domain☆12May 31, 2022Updated 3 years ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆831Updated this week
- ☆46Jun 1, 2023Updated 2 years ago
- Full finetuning of large language models without large memory requirements☆94Sep 22, 2025Updated 5 months ago
- A PyTorch native platform for training generative AI models☆5,098Updated this week
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆34Aug 24, 2024Updated last year
- Efficient Triton Kernels for LLM Training☆6,162Updated this week
- What would you do with 1000 H100s...☆1,154Jan 10, 2024Updated 2 years ago
- High-Performance FP32 GEMM on CUDA devices☆117Jan 21, 2025Updated last year
- Official Repository for "Training-Free Multi-Step Audio Source Separation"☆54May 26, 2025Updated 9 months ago
- https://huyenchip.com/ml-interviews-book/☆4,529Mar 21, 2025Updated 11 months ago
- Machine Learning Engineering Open Book☆17,162Feb 21, 2026Updated last week
- Tile primitives for speedy kernels☆3,183Updated this week
- Solve puzzles. Learn CUDA.☆11,959Sep 1, 2024Updated last year
- ☆294Updated this week
- Accessible large language models via k-bit quantization for PyTorch.☆7,997Updated this week
- Learn CUDA with PyTorch☆231Feb 23, 2026Updated last week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,184Aug 22, 2025Updated 6 months ago
- ring-attention experiments☆165Oct 17, 2024Updated last year
- a minimal cache manager for PagedAttention, on top of llama3.☆136Aug 26, 2024Updated last year
- Document Q&A over The Full Stack's Corpus☆360Aug 18, 2024Updated last year
- ☆15Oct 24, 2023Updated 2 years ago
- PyTorch native post-training library☆5,691Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,728May 21, 2025Updated 9 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆820Updated this week
- Solve puzzles. Improve your pytorch.☆3,950Jul 15, 2024Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆595Aug 12, 2025Updated 6 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,903Updated this week
- Summaries and resources for Designing Machine Learning Systems book (Chip Huyen, O'Reilly 2022)☆4,429Oct 31, 2025Updated 4 months ago