pengyanghua / DL2Links

a deep learning-driven scheduler for elastic training in deep learning clusters

☆30

Alternatives and similar repositories for DL2

Users that are interested in DL2 are comparing it to the libraries listed below

Sorting:

stanford-futuredata / POP
Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021
☆26Updated 3 years ago
pengyanghua / optimus
A Deep Learning Cluster Scheduler
☆39Updated 4 years ago
S-Lab-System-Group / ChronusArtifact
☆22Updated 3 years ago
hiddenlayer2020 / ML-Job-Scheduler-MLFS
☆11Updated 4 years ago
DIR-LAB / deep-batch-scheduler
RLScheduler: An AutomatedHPC Batch Job Scheduler Using Reinforcement Learning [SC'20]
☆61Updated 2 years ago
reconfigurable-ml-pipeline / ipa
Source code of IPA, https://escholarship.org/uc/item/2p0805dq
☆10Updated last year
S-Lab-System-Group / HeliosArtifact
HeliosArtifact
☆20Updated 2 years ago
S-Lab-System-Group / HeliosData
Helios Traces from SenseTime
☆56Updated 2 years ago
hkust-adsl / kubernetes-scheduler-simulator
Kubernetes Scheduler Simulator
☆114Updated last year
hongzimao / decima-sim
Learning Scheduling Algorithms for Data Processing Clusters
☆310Updated 4 years ago
msr-fiddle / blox
☆44Updated last year
lwangbm / Metis
Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale
☆18Updated 5 years ago
XiaofeiTJU / KaiS
☆46Updated 4 years ago
siasosp23 / artifacts
☆20Updated last year
S-Lab-System-Group / Lucid
Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
☆55Updated 2 years ago
S-Lab-System-Group / Hydro
Surrogate-based Hyperparameter Tuning System
☆28Updated 2 years ago
SymbioticLab / Tiresias
Tiresias is a GPU cluster manager for distributed deep learning training.
☆155Updated 5 years ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
msr-fiddle / synergy
☆51Updated 2 years ago
stanford-futuredata / gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
☆128Updated last year
S-Lab-System-Group / Awesome-DL-Scheduling-Papers
☆300Updated last year
Rivendile / Muri
Artifacts for our SIGCOMM'22 paper Muri
☆42Updated last year
romilbhardwaj / cilantro
Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"
☆39Updated 2 years ago
artpad6 / gemel_nsdi23
☆21Updated last year
uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆44Updated 2 years ago
MincYu / gillis-open-source
☆26Updated 2 years ago
msr-fiddle / philly-traces
☆191Updated 5 years ago
kzhang28 / Optimus
An Efficient Dynamic Resource Scheduler for Deep Learning Clusters
☆42Updated 7 years ago
tonyzhao-jt / LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
☆34Updated 3 weeks ago
usc-isi / PipeEdge
PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices
☆35Updated last year