🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
☆3,746Jul 25, 2025Updated 7 months ago
Alternatives and similar repositories for AI-Infra-from-Zero-to-Hero
Users that are interested in AI-Infra-from-Zero-to-Hero are comparing it to the libraries listed below
Sorting:
- System for AI Education Resource.☆4,238Oct 25, 2024Updated last year
- 《Machine Learning Systems: Design and Implementation》☆4,775Updated this week
- A list of awesome compiler projects and papers for tensor computation and deep learning.☆2,733Oct 19, 2024Updated last year
- CS294; AI For Systems and Systems For AI☆225Aug 30, 2019Updated 6 years ago
- My learning notes for ML SYS.☆5,580Mar 2, 2026Updated last week
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,047Updated this week
- Systems for ML/AI & ML/AI for Systems paper reading list: A curated reading list of computer science research for work at the intersectio…☆284Jun 9, 2025Updated 9 months ago
- Large Language Model (LLM) Systems Paper List☆1,862Feb 27, 2026Updated 2 weeks ago
- compiler learning resources collect.☆2,693Mar 19, 2025Updated 11 months ago
- Dive into Deep Learning Compiler☆646Jun 19, 2022Updated 3 years ago
- A high performance and generic framework for distributed DNN training☆3,717Oct 3, 2023Updated 2 years ago
- Tutorial code on how to build your own Deep Learning System in 2k Lines☆2,016Oct 4, 2018Updated 7 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,005Sep 19, 2024Updated last year
- Advanced Topics on Systems for X☆282Jul 10, 2024Updated last year
- Open Machine Learning Compiler Framework☆13,174Updated this week
- FlashInfer: Kernel Library for LLM Serving☆5,101Updated this week
- The road to hack SysML and become an system expert☆510Sep 25, 2024Updated last year
- how to optimize some algorithm in cuda.☆2,863Updated this week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆9,872Updated this week
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,864Mar 7, 2026Updated last week
- Distributed Compiler based on Triton for Parallel Systems☆1,380Feb 13, 2026Updated last month
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,880Updated this week
- ☆633Jan 14, 2026Updated 2 months ago
- Training and serving large-scale neural networks with auto parallelization.☆3,188Dec 9, 2023Updated 2 years ago
- paper and its code for AI System☆353Feb 10, 2026Updated last month
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,931Updated this week
- 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)☆2,200Jun 1, 2022Updated 3 years ago
- Material for gpu-mode lectures☆5,818Feb 1, 2026Updated last month
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆199Apr 27, 2022Updated 3 years ago
- Ongoing research training transformer models at scale☆15,535Mar 7, 2026Updated last week
- Transformer related optimization, including BERT, GPT☆6,399Mar 27, 2024Updated last year
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,389Mar 7, 2026Updated last week
- This is the (evolving) reading list for the seminar.☆62Nov 4, 2020Updated 5 years ago
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆918Dec 30, 2024Updated last year
- Development repository for the Triton language and compiler☆18,573Mar 7, 2026Updated last week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,216Updated this week
- OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.☆9,391Dec 4, 2025Updated 3 months ago
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆995Mar 3, 2026Updated last week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆41,773Updated this week