alpa-projects/mms

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/alpa-projects/mms)

alpa-projects / mms

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)

☆94

Alternatives and similar repositories for mms

Users that are interested in mms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Raphael-Hao / brainstorm
View on GitHub
Compiler for Dynamic Neural Networks
☆45Nov 13, 2023Updated 2 years ago
Hsword / SpotServe
View on GitHub
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆135Feb 22, 2024Updated 2 years ago
DS3Lab / Decentralized_FM_alpha
View on GitHub
☆18May 4, 2023Updated 3 years ago
TankLabTJU / INFless
View on GitHub
The source code of INFless，a native serverless platform for AI inference.
☆46Oct 10, 2022Updated 3 years ago
IBM / LLM-performance-prediction
View on GitHub
Predict the performance of LLM inference services
☆23Sep 18, 2025Updated 10 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
HPMLL / BurstGPT
View on GitHub
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆280Jun 30, 2026Updated 3 weeks ago
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆826Apr 6, 2025Updated last year
lzhangbv / acpsgd
View on GitHub
[ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
☆10Apr 28, 2023Updated 3 years ago
EfficientLLMSys / MuxServe
View on GitHub
☆15Jun 26, 2024Updated 2 years ago
thu-pacman / SmartMoE-AE
View on GitHub
ATC23 AE
☆45May 11, 2023Updated 3 years ago
Azure / AzurePublicDataset
View on GitHub
Microsoft Azure Traces
☆1,159Jun 3, 2026Updated last month
llumnix-project / llumnix-ray
View on GitHub
Efficient and easy multi-instance LLM serving
☆563Mar 12, 2026Updated 4 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
uwsampl / nexus
View on GitHub
☆85Feb 5, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆969Mar 29, 2026Updated 3 months ago
stanford-mast / INFaaS
View on GitHub
Model-less Inference Serving
☆94Nov 4, 2023Updated 2 years ago
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆223Sep 21, 2024Updated last year
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,897Jul 1, 2026Updated 3 weeks ago
PKUZHOU / PetS-ATC-2022
View on GitHub
☆10Sep 14, 2023Updated 2 years ago
awslabs / slapo
View on GitHub
A schedule language for large model training
☆153Aug 21, 2025Updated 11 months ago
microsoft / apex_plus
View on GitHub
APEX+ is an LLM Serving Simulator
☆49Jun 16, 2025Updated last year
microsoft / vidur
View on GitHub
Accurate, large-scale, and extensible simulator for LLM inference Systems
☆646Jul 25, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
hao-ai-lab / MuxServe
View on GitHub
☆90Oct 17, 2025Updated 9 months ago
suquark / ExoFlow
View on GitHub
A universal workflow system for exactly-once DAGs
☆23Jun 1, 2023Updated 3 years ago
casys-kaist / EnvPipe
View on GitHub
☆27Aug 31, 2023Updated 2 years ago
resource-disaggregation / jiffy
View on GitHub
Virtual Memory Abstraction for Serverless Architectures
☆49Mar 18, 2022Updated 4 years ago
eniac / paella
View on GitHub
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆72May 1, 2024Updated 2 years ago
chhzh123 / ptc-tutorial
View on GitHub
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Mar 13, 2023Updated 3 years ago
All-less / faas-scheduling-benchmark
View on GitHub
A benchmark suite for evaluating FaaS scheduler.
☆23Nov 5, 2022Updated 3 years ago
zhengzangw / Sequence-Scheduling
View on GitHub
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆93May 23, 2023Updated 3 years ago
SJTU-IPADS / disb
View on GitHub
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆58Aug 21, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
suquark / hoplite
View on GitHub
☆43Sep 6, 2021Updated 4 years ago
sigserverless / Dilu
View on GitHub
This is the code repository for our work on resourcing-on-demand GPU provisioning for serverless deep learning serving.
☆17Apr 17, 2026Updated 3 months ago
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆267Nov 18, 2024Updated last year
jashwantraj92 / cocktail
View on GitHub
☆16Aug 15, 2024Updated last year
llm-db / FineInfer
View on GitHub
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
☆19May 28, 2024Updated 2 years ago
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
thu-pacman / PET
View on GitHub
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆126Jun 23, 2022Updated 4 years ago