dywsjtu / apparateLinks

Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]

☆25

Alternatives and similar repositories for apparate

Users that are interested in apparate are comparing it to the libraries listed below

Sorting:

SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
casys-kaist / EnvPipe
☆25Updated last year
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆36Updated 3 years ago
uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆44Updated 2 years ago
James-QiuHaoran / LLM-serving-with-proxy-models
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …
☆36Updated last year
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆123Updated last year
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆53Updated 7 months ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆52Updated 10 months ago
gudiandian / ElasticFlow
☆16Updated last year
WukLab / preble
Stateful LLM Serving
☆76Updated 4 months ago
thustorage / Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆24Updated 2 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆82Updated last year
jasperzhong / swift
☆14Updated 3 years ago
pkusys / Auncel
Vector search with bounded performance.
☆35Updated last year
zhuangwang93 / Cupcake
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)
☆9Updated 2 years ago
llm-db / FineInfer
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
☆17Updated last year
alibaba-edu / qwen-bailian-usagetraces-anon
☆26Updated last month
Ying1123 / VTC-artifact
☆32Updated last year
msr-fiddle / CheckFreq
☆54Updated 4 years ago
romilbhardwaj / cilantro
Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"
☆39Updated 2 years ago
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆50Updated 2 years ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆52Updated last year
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 2 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆61Updated last year
mutinifni / splitwise-sim
LLM serving cluster simulator
☆107Updated last year
liangyuRain / ForestColl
☆12Updated 2 months ago
UChi-JCL / CacheGen
☆110Updated 9 months ago
suquark / hoplite
☆45Updated 3 years ago
cirquit / hivemind-multi-cloud
☆9Updated 11 months ago
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year