dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆22Updated last month
Alternatives and similar repositories for apparate:
Users that are interested in apparate are comparing it to the libraries listed below
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated 2 years ago
- Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …☆24Updated 7 months ago
- SOTA Learning-augmented Systems☆34Updated 2 years ago
- ☆24Updated last year
- Stateful LLM Serving☆44Updated 5 months ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆12Updated 7 months ago
- ☆13Updated 2 years ago
- Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.☆48Updated 2 years ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆53Updated 8 months ago
- ☆48Updated 7 months ago
- Code for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]☆39Updated 2 years ago
- ☆15Updated 7 months ago
- ☆16Updated 6 months ago
- A resilient distributed training framework☆88Updated 9 months ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆14Updated last month
- ☆18Updated 6 months ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆53Updated 4 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆21Updated last month
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆109Updated 10 months ago
- ☆14Updated 3 years ago
- Vector search with bounded performance.☆34Updated 11 months ago
- An Attention Superoptimizer☆20Updated 8 months ago
- ☆53Updated 3 years ago
- Surrogate-based Hyperparameter Tuning System☆28Updated last year
- This repository contains code for the paper: Bergsma S., Zeyl T., Senderovich A., and Beck J. C., "Generating Complex, Realistic Cloud Wo…☆42Updated 3 years ago
- Primo: Practical Learning-Augmented Systems with Interpretable Models☆19Updated last year
- UCCL: an Efficient Collective Communication Library for GPUs☆18Updated this week
- ☆20Updated 3 years ago
- ☆48Updated 2 years ago
- Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)☆9Updated last year