byungsoo-oh/ml-systems-papers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/byungsoo-oh/ml-systems-papers)

byungsoo-oh / ml-systems-papers

Curated collection of papers in machine learning systems

☆632

Alternatives and similar repositories for ml-systems-papers

Users that are interested in ml-systems-papers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AmberLJC / LLMSys-PaperList
View on GitHub
Large Language Model (LLM) Systems Paper List
☆2,194Updated this week
S-Lab-System-Group / Awesome-DL-Scheduling-Papers
View on GitHub
☆332Jan 22, 2024Updated 2 years ago
lambda7xx / awesome-AI-system
View on GitHub
paper and its code for AI System
☆374May 14, 2026Updated 2 months ago
galeselee / Awesome_LLM_System-PaperList
View on GitHub
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆284Mar 6, 2025Updated last year
DicardoX / Research-Space
View on GitHub
This repository is established to store personal notes and annotated papers during daily research.
☆200Jun 28, 2026Updated 3 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MoE-Inf / awesome-moe-inference
View on GitHub
Curated collection of papers in MoE model inference
☆408Mar 12, 2026Updated 4 months ago
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆504Updated this week
casys-kaist / LLMServingSim
View on GitHub
LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure
☆340Updated this week
kungfu-team / tenplex
View on GitHub
Dynamic resources changes for multi-dimensional parallelism training
☆31Aug 22, 2025Updated 10 months ago
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,404Jun 23, 2026Updated 3 weeks ago
AmadeusChan / Awesome-LLM-System-Papers
View on GitHub
☆646Jan 14, 2026Updated 6 months ago
chenhongyu2048 / LLM-inference-optimization-paper
View on GitHub
Summary of some awesome work for optimizing LLM inference
☆261Feb 14, 2026Updated 5 months ago
mental2008 / awesome-papers
View on GitHub
Here are my personal paper reading notes (including machine learning systems, AI infrastructure, and other interesting stuffs).
☆216Updated this week
open-neutrino / neutrino
View on GitHub
☆263Dec 25, 2025Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆511Jan 8, 2026Updated 6 months ago
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆827Apr 6, 2025Updated last year
thu-pacman / FasterMoE
View on GitHub
☆92Apr 2, 2022Updated 4 years ago
ByteDance-Seed / StragglerAnalysis
View on GitHub
☆56Apr 30, 2025Updated last year
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
TJU-NSL / awesome-papers
View on GitHub
☆37Updated this week
Shenggan / awesome-distributed-ml
View on GitHub
A curated list of awesome projects and papers for distributed training or inference
☆280Oct 8, 2024Updated last year
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆968Mar 29, 2026Updated 3 months ago
HPMLL / BurstGPT
View on GitHub
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆278Jun 30, 2026Updated 2 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
aliyun / SimAI
View on GitHub
☆1,029Apr 24, 2026Updated 2 months ago
alibaba / ServeGen
View on GitHub
A framework for generating realistic LLM serving workloads
☆161May 11, 2026Updated 2 months ago
llumnix-project / llumnix-ray
View on GitHub
Efficient and easy multi-instance LLM serving
☆562Mar 12, 2026Updated 4 months ago
astra-sim / astra-sim
View on GitHub
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
☆641Apr 25, 2026Updated 2 months ago
eth-easl / sailor
View on GitHub
AI model training on heterogeneous, geo-distributed resources
☆46Nov 24, 2025Updated 7 months ago
HuaizhengZhang / AI-Infra-from-Zero-to-Hero
View on GitHub
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Mod…
☆4,206Jul 25, 2025Updated 11 months ago
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,743Updated this week
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
uccl-project / uccl
View on GitHub
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…
☆1,465Updated this week
JF-D / Proteus
View on GitHub
☆24Jul 7, 2024Updated 2 years ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,343Aug 28, 2025Updated 10 months ago
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆272May 5, 2026Updated 2 months ago
InternLM / AcmeTrace
View on GitHub
☆179Mar 12, 2024Updated 2 years ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
antgroup / glake
View on GitHub
GLake: optimizing GPU memory management and IO transmission.
☆501Mar 24, 2025Updated last year