cake-lab / perseus
☆10Updated last year
Alternatives and similar repositories for perseus:
Users that are interested in perseus are comparing it to the libraries listed below
- a deep learning-driven scheduler for elastic training in deep learning clusters☆28Updated 4 years ago
- ☆20Updated 3 years ago
- A Deep Learning Cluster Scheduler☆37Updated 4 years ago
- Distributed ML Optimizer☆30Updated 3 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated 2 years ago
- sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data☆64Updated 6 months ago
- Machine learning on serverless platform☆8Updated 2 years ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Updated 3 years ago
- ddl-benchmarks: Benchmarks for Distributed Deep Learning☆37Updated 4 years ago
- Herald: Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024)☆20Updated 8 months ago
- GPU topology-aware scheduler☆12Updated 7 years ago
- Multi-Instance-GPU profiling tool☆56Updated last year
- Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021☆24Updated 3 years ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- Model-less Inference Serving☆83Updated last year
- This is the (evolving) reading list for the seminar.☆57Updated 4 years ago
- ☆11Updated last year
- Surrogate-based Hyperparameter Tuning System☆28Updated last year
- Machine Learning System☆14Updated 4 years ago
- This repository contains code for the paper: Bergsma S., Zeyl T., Senderovich A., and Beck J. C., "Generating Complex, Realistic Cloud Wo…☆42Updated 3 years ago
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Updated last year
- Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale☆17Updated 4 years ago
- An Efficient Dynamic Resource Scheduler for Deep Learning Clusters☆42Updated 7 years ago
- Some microbenchmarks and design docs before commencement☆12Updated 3 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆49Updated 2 years ago
- ☆43Updated 3 years ago
- Deadline-based hyperparameter tuning on RayTune.☆31Updated 5 years ago
- Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020☆126Updated 6 months ago
- Fine-grained GPU sharing primitives☆140Updated 4 years ago
- 各种深度学习(DL)框架分布式训练,包括:Tensorflow、Tensorflow2、Pytorch、Chainer、Caffe、Mxnet ...☆20Updated 4 years ago