HuaizhengZhang / AI-Infra-from-Zero-to-HeroView external linksLinks
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
☆3,694Jul 25, 2025Updated 6 months ago
Alternatives and similar repositories for AI-Infra-from-Zero-to-Hero
Users that are interested in AI-Infra-from-Zero-to-Hero are comparing it to the libraries listed below
Sorting:
- System for AI Education Resource.☆4,222Oct 25, 2024Updated last year
- 《Machine Learning Systems: Design and Implementation》- Chinese Version☆4,761Apr 13, 2024Updated last year
- A list of awesome compiler projects and papers for tensor computation and deep learning.☆2,731Oct 19, 2024Updated last year
- CS294; AI For Systems and Systems For AI☆227Aug 30, 2019Updated 6 years ago
- My learning notes for ML SYS.☆5,351Jan 30, 2026Updated 2 weeks ago
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆4,990Jan 18, 2026Updated last month
- Systems for ML/AI & ML/AI for Systems paper reading list: A curated reading list of computer science research for work at the intersectio…☆283Jun 9, 2025Updated 8 months ago
- Large Language Model (LLM) Systems Paper List☆1,818Feb 8, 2026Updated last week
- compiler learning resources collect.☆2,678Mar 19, 2025Updated 10 months ago
- Dive into Deep Learning Compiler☆644Jun 19, 2022Updated 3 years ago
- A high performance and generic framework for distributed DNN training☆3,716Oct 3, 2023Updated 2 years ago
- Tutorial code on how to build your own Deep Learning System in 2k Lines☆2,017Oct 4, 2018Updated 7 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,006Sep 19, 2024Updated last year
- Advanced Topics on Systems for X☆283Jul 10, 2024Updated last year
- FlashInfer: Kernel Library for LLM Serving☆4,983Updated this week
- Open Machine Learning Compiler Framework☆13,117Updated this week
- The road to hack SysML and become an system expert☆509Sep 25, 2024Updated last year
- how to optimize some algorithm in cuda.☆2,819Updated this week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆9,666Updated this week
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,860Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,358Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,771Updated this week
- paper and its code for AI System☆347Feb 10, 2026Updated last week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,896Updated this week
- ☆630Jan 14, 2026Updated last month
- Training and serving large-scale neural networks with auto parallelization.☆3,180Dec 9, 2023Updated 2 years ago
- 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)☆2,200Jun 1, 2022Updated 3 years ago
- Material for gpu-mode lectures☆5,752Feb 1, 2026Updated 2 weeks ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆199Apr 27, 2022Updated 3 years ago
- Ongoing research training transformer models at scale☆15,213Updated this week
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,266Feb 11, 2026Updated last week
- Transformer related optimization, including BERT, GPT☆6,392Mar 27, 2024Updated last year
- This is the (evolving) reading list for the seminar.☆61Nov 4, 2020Updated 5 years ago
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆916Dec 30, 2024Updated last year
- Development repository for the Triton language and compiler☆18,429Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆23,547Updated this week
- OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.☆9,392Dec 4, 2025Updated 2 months ago
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆929Nov 27, 2025Updated 2 months ago
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆41,259Updated this week