Hsword / SpotServe
View external linksLinks

SpotServe: Serving Generative Large Language Models on Preemptible Instances

☆135

Alternatives and similar repositories for SpotServe

Users that are interested in SpotServe are comparing it to the libraries listed below

Sorting:

LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆776Apr 6, 2025Updated 10 months ago
hao-ai-lab / MuxServe
View on GitHub
☆85Oct 17, 2025Updated 3 months ago
LoongServe / LoongServe
View on GitHub
☆131Nov 11, 2024Updated last year
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆470Jan 8, 2026Updated last month
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆209Sep 21, 2024Updated last year
AlibabaPAI / llumnix
View on GitHub
Efficient and easy multi-instance LLM serving
☆527Sep 3, 2025Updated 5 months ago
ServerlessLLM / ServerlessLLM
View on GitHub
Serverless LLM Serving for Everyone.
☆647Jan 23, 2026Updated 3 weeks ago
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Nov 21, 2024Updated last year
IBM / LLM-performance-prediction
View on GitHub
Predict the performance of LLM inference services
☆21Sep 18, 2025Updated 4 months ago
JF-D / Parcae
View on GitHub
☆22Apr 22, 2024Updated last year
alpa-projects / mms
View on GitHub
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆93Jul 14, 2023Updated 2 years ago
HPMLL / BurstGPT
View on GitHub
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆238Feb 1, 2026Updated last week
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆945Oct 29, 2025Updated 3 months ago
eth-easl / orion
View on GitHub
An interference-aware scheduler for fine-grained GPU sharing
☆159Nov 26, 2025Updated 2 months ago
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,859Feb 7, 2026Updated last week
tyler-griggs / melange-release
View on GitHub
☆47Jun 27, 2024Updated last year
SJTU-IPADS / disb
View on GitHub
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆58Aug 21, 2024Updated last year
Relaxed-System-Lab / HexGen
View on GitHub
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆34May 6, 2024Updated last year
uclasystem / bamboo
View on GitHub
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆55Dec 11, 2022Updated 3 years ago
microsoft / vidur
View on GitHub
A large-scale simulation framework for LLM inference
☆530Jul 25, 2025Updated 6 months ago
Thesys-lab / Helix-ASPLOS25
View on GitHub
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆76Oct 15, 2025Updated 3 months ago
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
resource-disaggregation / jiffy
View on GitHub
Virtual Memory Abstraction for Serverless Architectures
☆49Mar 18, 2022Updated 3 years ago
Trinity-data-store / Trinity
View on GitHub
EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"
☆19Mar 8, 2025Updated 11 months ago
uw-mad-dash / shockwave
View on GitHub
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆46Nov 24, 2022Updated 3 years ago
pkusys / ElasticFlow
View on GitHub
Artifacts for our ASPLOS'23 paper ElasticFlow
☆55May 10, 2024Updated last year
AmberLJC / LLMSys-PaperList
View on GitHub
Large Language Model (LLM) Systems Paper List
☆1,818Updated this week
lambda7xx / awesome-AI-system
View on GitHub
paper and its code for AI System
☆347Updated this week
ruipeterpan / paper_notes
View on GitHub
Personal blog + reading notes on system-ish papers
☆15Oct 29, 2023Updated 2 years ago
mutinifni / splitwise-sim
View on GitHub
LLM serving cluster simulator
☆135Apr 25, 2024Updated last year
zhengzangw / Sequence-Scheduling
View on GitHub
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆93May 23, 2023Updated 2 years ago
project-etalon / etalon
View on GitHub
LLM Serving Performance Evaluation Harness
☆83Feb 25, 2025Updated 11 months ago
alibaba / llm-scheduling-artifact
View on GitHub
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated last year
snu-comparch / InfiniGen
View on GitHub
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆174Jul 10, 2024Updated last year
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆36Aug 29, 2025Updated 5 months ago
S-Lab-System-Group / Awesome-DL-Scheduling-Papers
View on GitHub
☆323Jan 22, 2024Updated 2 years ago
WukLab / InferCept
View on GitHub
☆34Jun 22, 2024Updated last year
alibaba / ServeGen
View on GitHub
A framework for generating realistic LLM serving workloads
☆100Oct 9, 2025Updated 4 months ago
Rivendile / Muri
View on GitHub
Artifacts for our SIGCOMM'22 paper Muri
☆43Dec 29, 2023Updated 2 years ago

Hsword / SpotServeView external linksLinks

Alternatives and similar repositories for SpotServe

Hsword / SpotServe
View external linksLinks