alibaba / ServeGenLinks

A framework for generating realistic LLM serving workloads

☆94

Alternatives and similar repositories for ServeGen

Users that are interested in ServeGen are comparing it to the libraries listed below

Sorting:

HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆228Updated 5 months ago
WukLab / preble
Stateful LLM Serving
☆93Updated 10 months ago
Hsword / Awesome-Machine-Learning-System-Papers
☆79Updated 3 years ago
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆134Updated last year
hao-ai-lab / MuxServe
☆81Updated 2 months ago
LoongServe / LoongServe
☆130Updated last year
alibaba-edu / qwen-bailian-usagetraces-anon
☆70Updated 2 months ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆92Updated 2 years ago
thustorage / Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆40Updated 7 months ago
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆75Updated 2 months ago
LLMServe / dLoRA-artifact
☆29Updated last year
NEO-MLSys25 / NEO
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆77Updated 6 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆146Updated this week
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆173Updated this week
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆158Updated last month
NetX-lab / Ayo
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo
☆57Updated 5 months ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆206Updated last year
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆34Updated last year
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆55Updated 3 years ago
JF-D / Parcae
☆22Updated last year
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆30Updated 4 months ago
James-QiuHaoran / LLM-serving-with-proxy-models
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …
☆48Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Updated last year
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆55Updated last year
xinhao-luo / ClusterFusion
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆62Updated last month
EfficientLLMSys / MuxServe
☆15Updated last year
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆170Updated last year
Ying1123 / VTC-artifact
☆43Updated last year
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆45Updated 2 years ago