microsoft / OPPerTuneLinks
☆17Updated 10 months ago
Alternatives and similar repositories for OPPerTune
Users that are interested in OPPerTune are comparing it to the libraries listed below
Sorting:
- Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving☆37Updated 6 years ago
- TraceWeaver is a research prototype for transparently tracing requests through a microservice without application instrumentation.☆23Updated last year
- This repository contains code for the paper: Bergsma S., Zeyl T., Senderovich A., and Beck J. C., "Generating Complex, Realistic Cloud Wo…☆43Updated 4 years ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19Updated last year
- ☆44Updated 4 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Updated last year
- ☁️ Benchmarking LLMs for Cloud Config Generation | 云场景下的大模型基准测试☆39Updated last year
- ☆15Updated 3 years ago
- ☆23Updated 4 years ago
- Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021☆28Updated 4 years ago
- MeshInsight: Dissecting Overheads of Service Mesh Sidecars☆47Updated 2 years ago
- FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute (USENIX ATC'21)☆56Updated 4 years ago
- Primo: Practical Learning-Augmented Systems with Interpretable Models☆19Updated 2 years ago
- Tiresias is a GPU cluster manager for distributed deep learning training.☆166Updated 5 years ago
- Distributed tracing data from Meta's microservices architecture.☆25Updated 2 years ago
- Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"☆40Updated 2 years ago
- Surrogate-based Hyperparameter Tuning System☆28Updated 2 years ago
- A Generic Resource-Aware Hyperparameter Tuning Execution Engine☆15Updated 4 years ago
- Predict the performance of LLM inference services☆21Updated 4 months ago
- A resilient distributed training framework☆96Updated last year
- Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale☆19Updated 5 years ago
- a deep learning-driven scheduler for elastic training in deep learning clusters☆31Updated 5 years ago
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 4 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Updated last year
- ☆19Updated 8 years ago
- An Efficient Dynamic Resource Scheduler for Deep Learning Clusters☆41Updated 8 years ago
- 🔮 Execution time predictions for deep neural network training iterations across different GPUs.☆63Updated 3 years ago
- This repo contains the scripts used to create the data for the ATC2020 paper "Reconstructing proprietary video streaming algorithms"☆14Updated 4 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Updated 3 years ago
- Selected Topics in Computer Networks @ Johns Hopkins University☆19Updated 5 years ago