gpu-mode/awesomeMLSys

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gpu-mode/awesomeMLSys)

gpu-mode / awesomeMLSys

An ML Systems Onboarding list

☆1,102

Alternatives and similar repositories for awesomeMLSys

Users that are interested in awesomeMLSys are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gpu-mode / resource-stream
View on GitHub
GPU programming related news and material links
☆2,233Jun 15, 2026Updated last month
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,330Jun 15, 2026Updated last month
gpu-mode / triton-index
View on GitHub
Cataloging released Triton kernels.
☆311Sep 9, 2025Updated 10 months ago
gpu-mode / Triton-Puzzles
View on GitHub
Puzzles for learning Triton
☆2,531Apr 1, 2026Updated 3 months ago
HazyResearch / aisys-building-blocks
View on GitHub
Building blocks for foundation models.
☆633Jan 3, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,063Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,753Updated this week
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,552Jul 13, 2026Updated last week
arpitingle / gpu-alpha
View on GitHub
High Quality Resources on GPU Programming/Architecture
☆592Jul 26, 2024Updated last year
stas00 / ml-engineering
View on GitHub
Machine Learning Engineering Open Book
☆18,432Updated this week
AmberLJC / LLMSys-PaperList
View on GitHub
Large Language Model (LLM) Systems Paper List
☆2,195Updated this week
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,528Updated this week
sgl-project / sgl-learning-materials
View on GitHub
Materials for learning SGLang
☆860Jan 5, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,578Updated this week
huggingface / picotron
View on GitHub
Minimalistic 4D-parallelism distributed training framework for education purpose
☆2,254Aug 26, 2025Updated 10 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,404Jun 23, 2026Updated 3 weeks ago
HuaizhengZhang / AI-Infra-from-Zero-to-Hero
View on GitHub
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Mod…
☆4,209Jul 25, 2025Updated 11 months ago
MLSys-Learner-Resources / Awesome-MLSys-Blogger
View on GitHub
The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)
☆340Jan 5, 2025Updated last year
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,142Updated this week
srush / GPU-Puzzles
View on GitHub
Solve puzzles. Learn CUDA.
☆12,332Sep 1, 2024Updated last year
sgl-project / mini-sglang
View on GitHub
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆4,607May 17, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,738Updated this week
pytorch / torchtitan
View on GitHub
A PyTorch native platform for training generative AI models
☆5,545Updated this week
srush / LLM-Training-Puzzles
View on GitHub
What would you do with 1000 H100s...
☆1,181Jan 10, 2024Updated 2 years ago
srush / Tensor-Puzzles
View on GitHub
Solve puzzles. Improve your pytorch.
☆4,237Jul 15, 2024Updated 2 years ago
SiriusNEO / Triton-Puzzles-Lite
View on GitHub
Puzzles for learning Triton, play it with minimal environment configuration!
☆735Mar 17, 2026Updated 4 months ago
meta-pytorch / applied-ai
View on GitHub
Applied AI experiments and examples for PyTorch
☆322Aug 22, 2025Updated 10 months ago
facebookresearch / tensor-layouts
View on GitHub
A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.
☆231Jun 29, 2026Updated 3 weeks ago
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆910Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
stas00 / the-art-of-debugging
View on GitHub
The Art of Debugging Open Book
☆1,667Jul 9, 2026Updated last week
meta-pytorch / gpt-fast
View on GitHub
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
☆6,229Aug 22, 2025Updated 10 months ago
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,545Updated this week
BobMcDear / attorch
View on GitHub
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆606May 13, 2026Updated 2 months ago
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆591Nov 7, 2025Updated 8 months ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,379Updated this week
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,674Updated this week