Shenggan/awesome-distributed-ml

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Shenggan/awesome-distributed-ml)

Shenggan / awesome-distributed-ml

A curated list of awesome projects and papers for distributed training or inference

☆279

Alternatives and similar repositories for awesome-distributed-ml

Users that are interested in awesome-distributed-ml are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alibaba / SRDiffusion
View on GitHub
Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
☆20Jun 11, 2025Updated last year
alibaba / easydist
View on GitHub
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆81Nov 19, 2024Updated last year
ConnollyLeon / awesome-Auto-Parallelism
View on GitHub
A baseline repository of Auto-Parallelism in Training Neural Networks
☆145Jun 25, 2022Updated 4 years ago
hpcaitech / ColossalAI-Benchmark
View on GitHub
Performance benchmarking with ColossalAI
☆39Jul 6, 2022Updated 4 years ago
S-Lab-System-Group / Awesome-DL-Scheduling-Papers
View on GitHub
☆333Jan 22, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Hsword / Awesome-Machine-Learning-System-Papers
View on GitHub
☆80Mar 7, 2022Updated 4 years ago
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated last year
microsoft / ark
View on GitHub
A GPU-driven system framework for scalable AI applications
☆130Jul 15, 2026Updated 2 weeks ago
hpcaitech / ColossalAI-Documentation
View on GitHub
Documentation for Colossal-AI
☆25Jun 6, 2025Updated last year
SJTU-IPADS / disb
View on GitHub
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆58Aug 21, 2024Updated last year
hpcaitech / TensorNVMe
View on GitHub
A Python library transfers PyTorch tensors between CPU and NVMe
☆124Nov 27, 2024Updated last year
lambda7xx / awesome-AI-system
View on GitHub
paper and its code for AI System
☆377May 14, 2026Updated 2 months ago
NUS-HPC-AI-Lab / oh-my-server
View on GitHub
☆30Sep 4, 2023Updated 2 years ago
microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆156Jul 1, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alpa-projects / alpa
View on GitHub
Training and serving large-scale neural networks with auto parallelization.
☆3,180Dec 9, 2023Updated 2 years ago
HuaizhengZhang / AI-Infra-from-Zero-to-Hero
View on GitHub
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Mod…
☆4,233Jul 25, 2025Updated last year
byungsoo-oh / ml-systems-papers
View on GitHub
Curated collection of papers in machine learning systems
☆638Feb 7, 2026Updated 5 months ago
awslabs / slapo
View on GitHub
A schedule language for large model training
☆153Aug 21, 2025Updated 11 months ago
galeselee / Awesome_LLM_System-PaperList
View on GitHub
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆286Mar 6, 2025Updated last year
eth-easl / orion
View on GitHub
An interference-aware scheduler for fine-grained GPU sharing
☆164Nov 26, 2025Updated 8 months ago
AmadeusChan / Awesome-LLM-System-Papers
View on GitHub
☆646Jan 14, 2026Updated 6 months ago
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Nov 21, 2024Updated last year
RulinShao / FastCkpt
View on GitHub
Python package for rematerialization-aware gradient checkpointing
☆27Oct 31, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
KnowingNothing / compiler-and-arch
View on GitHub
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
☆533Jan 15, 2025Updated last year
zhuohan123 / terapipe
View on GitHub
☆79May 4, 2021Updated 5 years ago
iree-org / iree-torch
View on GitHub
Torch Frontend for IREE
☆26Dec 21, 2023Updated 2 years ago
HPDL-Group / Merak
View on GitHub
☆86Feb 11, 2026Updated 5 months ago
DachengLi1 / AMP
View on GitHub
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆44Nov 4, 2022Updated 3 years ago
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆464May 7, 2025Updated last year
S-Lab-System-Group / Awesome-ML-for-System
View on GitHub
SOTA Learning-augmented Systems
☆37May 21, 2022Updated 4 years ago
saareliad / FTPipe
View on GitHub
FTPipe and related pipeline model parallelism research.
☆44May 16, 2023Updated 3 years ago
WukLab / InferCept
View on GitHub
☆34Jun 22, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,348Aug 28, 2025Updated 11 months ago
merrymercy / awesome-tensor-compilers
View on GitHub
A list of awesome compiler projects and papers for tensor computation and deep learning.
☆2,771Oct 19, 2024Updated last year
pytorch / PiPPy
View on GitHub
Pipeline Parallelism for PyTorch
☆786Aug 21, 2024Updated last year
msr-fiddle / dnn-partitioning
View on GitHub
☆42Oct 12, 2020Updated 5 years ago
guanh01 / CS692-mlsys
View on GitHub
This is the (evolving) reading list for the seminar.
☆62Nov 4, 2020Updated 5 years ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
parasailteam / coconet
View on GitHub
☆85Dec 2, 2022Updated 3 years ago