gajagajago / deepshare
Network Contention-Aware Cluster Scheduling with Reinforcement Learning (IEEE ICPADS 2023)
☆16Updated 5 months ago
Alternatives and similar repositories for deepshare:
Users that are interested in deepshare are comparing it to the libraries listed below
- ☆102Updated last year
- ☆12Updated last week
- Official Github repository for the SIGCOMM '24 paper "Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs"☆71Updated 9 months ago
- (ICPP '20) ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference☆12Updated 4 years ago
- [ACM EuroSys '23] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access☆57Updated last year
- ☆24Updated 6 years ago
- ☆26Updated 2 years ago
- ☆24Updated last year
- Welcome to PeriFlow CLI ☁︎☆12Updated last year
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆103Updated last month
- "JABAS: Joint Adaptive Batching and Automatic Scaling for DNN Training on Heterogeneous GPUs" (EuroSys '25)☆13Updated last week
- Helios Traces from SenseTime☆53Updated 2 years ago
- Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving (HPCA '23)☆13Updated 3 months ago
- ☆47Updated 3 months ago
- ☆16Updated last year
- ☆49Updated 2 years ago
- FastFlow is a system that automatically detects CPU bottlenecks in deep learning training pipelines and resolves the bottlenecks with dat…☆26Updated 2 years ago
- ☆41Updated 9 months ago
- ☆64Updated 2 weeks ago
- ☆45Updated 7 months ago
- ☆20Updated 3 years ago
- Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs☆53Updated last year
- ☆16Updated 11 months ago
- HeliosArtifact☆20Updated 2 years ago
- Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling☆10Updated last year
- BATCH: Adaptive Batching for Efficient MachineLearning Serving on Serverless Platforms☆9Updated 3 years ago
- ☆37Updated 3 years ago
- Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021☆25Updated 3 years ago
- Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale☆18Updated 4 years ago
- zTT: Learning-based DVFS with Zero Thermal Throttling for Mobile Devices [MobiSys'21] - Artifact Evaluation☆24Updated 3 years ago