Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale
☆19May 27, 2020Updated 5 years ago
Alternatives and similar repositories for Metis
Users that are interested in Metis are comparing it to the libraries listed below
Sorting:
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- ☆47Jan 18, 2021Updated 5 years ago
- ☆24Mar 20, 2021Updated 4 years ago
- ☆31Jan 21, 2021Updated 5 years ago
- Code for reproducing experiments performed for Accoridon☆13Jun 11, 2021Updated 4 years ago
- ☆14Mar 29, 2020Updated 5 years ago
- This repo contains the scripts used to create the data for the ATC2020 paper "Reconstructing proprietary video streaming algorithms"☆14Mar 24, 2021Updated 4 years ago
- A Generic Resource-Aware Hyperparameter Tuning Execution Engine☆15Jan 8, 2022Updated 4 years ago
- A new version for Pytheas (formally DDN), a control platform for enabling data-driven control for network applications☆14Nov 28, 2016Updated 9 years ago
- Code repository of GreenABR for MMSys 2022 submission☆13Apr 6, 2022Updated 3 years ago
- ☆29Apr 13, 2019Updated 6 years ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32May 15, 2024Updated last year
- RLScheduler: An AutomatedHPC Batch Job Scheduler Using Reinforcement Learning [SC'20]☆66May 30, 2023Updated 2 years ago
- ☆37Apr 15, 2023Updated 2 years ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- A Deep Learning Cluster Scheduler☆37Jan 11, 2021Updated 5 years ago
- Advanced job scheduling simulator☆18Nov 6, 2023Updated 2 years ago
- ☆20Jun 3, 2023Updated 2 years ago
- ☆47Jan 11, 2023Updated 3 years ago
- A collection of awesome and useful resources for research.☆25Jun 5, 2025Updated 8 months ago
- Getting Starting with NIMBUS-CORE☆10Dec 16, 2023Updated 2 years ago
- *flow source code☆23Aug 27, 2020Updated 5 years ago
- Interpreting Deep Learning-Based Networking Systems (SIGCOMM 2020)☆92May 28, 2025Updated 9 months ago
- Real-time IoT Benchmark Suite☆50Mar 25, 2018Updated 7 years ago
- ☆23Jan 7, 2022Updated 4 years ago
- ☆21Dec 8, 2022Updated 3 years ago
- RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads☆47Apr 7, 2021Updated 4 years ago
- SelfTune is an RL framework that enables systems and service developers to automatically tune various configuration parameters and other …☆46May 31, 2024Updated last year
- HeliosArtifact☆22Sep 27, 2022Updated 3 years ago
- Towards Universal Internet Congestion Control Benchmarking ...☆23Aug 11, 2023Updated 2 years ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- Record and replay for cellular network emulation☆30May 6, 2025Updated 9 months ago
- Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"☆22Jan 4, 2021Updated 5 years ago
- ☆95Apr 25, 2023Updated 2 years ago
- A resilient distributed training framework☆97Apr 11, 2024Updated last year
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- Objective Quality-of-Experience Model Benchmark☆26Feb 26, 2020Updated 6 years ago
- ☆31Jul 18, 2019Updated 6 years ago
- [ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access☆56Aug 6, 2025Updated 6 months ago