WukLab / prebleLinks

Stateful LLM Serving

☆87

Alternatives and similar repositories for preble

Users that are interested in preble are comparing it to the libraries listed below

Sorting:

Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆130Updated last year
LoongServe / LoongServe
☆124Updated 11 months ago
hao-ai-lab / MuxServe
☆74Updated last week
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆188Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
alibaba / ServeGen
A framework for generating realistic LLM serving workloads
☆71Updated 2 weeks ago
YaoJiayi / CacheBlend
☆144Updated 3 months ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
UChi-JCL / CacheGen
☆136Updated last year
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆138Updated 9 months ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆215Updated 3 months ago
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆113Updated 5 months ago
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆61Updated 11 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
NEO-MLSys25 / NEO
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆67Updated 4 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated 2 weeks ago
alibaba-edu / qwen-bailian-usagetraces-anon
☆56Updated 4 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆126Updated last week
James-QiuHaoran / LLM-serving-with-proxy-models
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …
☆47Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆88Updated 2 years ago
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆68Updated last week
kvcache-ai / TrEnv-X
☆63Updated last month
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆30Updated last year
Hsword / Awesome-Machine-Learning-System-Papers
☆78Updated 3 years ago
JF-D / Parcae
☆21Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆117Updated last month
stepfun-ai / StepMesh
☆309Updated 3 weeks ago
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆150Updated 9 months ago
shenh10 / DeepSeek_Simulator
☆90Updated 6 months ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆20Updated last year