heyfey / vodaschedulerLinks

GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)

☆35

Alternatives and similar repositories for vodascheduler

Users that are interested in vodascheduler are comparing it to the libraries listed below

Sorting:

kubedl-io / morphling
Automatic tuning for ML model deployment on Kubernetes
☆80Updated 9 months ago
Mr-Linus / SCV
SCV is a distributed cluster GPU sniffer. SCV是一个分布式GPU嗅探器
☆21Updated 2 years ago
alibaba / GPU-scheduler-for-deep-learning
GPU-scheduler-for-deep-learning
☆210Updated 4 years ago
kzhang28 / Optimus
An Efficient Dynamic Resource Scheduler for Deep Learning Clusters
☆42Updated 7 years ago
houminz / paper-reading
Paper Reading：涉及分布式、虚拟化、网络、机器学习
☆23Updated 4 years ago
k82cn / kubesim
A simulator of Kuberntes for batch and service workload.
☆47Updated 4 years ago
HiEST / gpu-topo-aware
GPU topology-aware scheduler
☆13Updated 8 years ago
NTHU-LSALAB / Gemini
An efficient GPU resource sharing system with fine-grained control for Linux platforms.
☆84Updated last year
pokerfaceSad / GPUMounter
A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod
☆127Updated 3 years ago
microsoft / hivedscheduler
Kubernetes Scheduler for Deep Learning
☆263Updated 3 years ago
king-jingxiang / pod-gpushare-metrics-exporter
Forked form
☆11Updated 4 years ago
ds2-lab / FaaSNet
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute (USENIX ATC'21)
☆55Updated 3 years ago
NTHU-LSALAB / KubeShare
Share GPU between Pods in Kubernetes
☆211Updated 2 years ago
NVIDIA / go-gpuallocator
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
☆116Updated last week
IBM / autopilot
A tool to detect infrastructure issues on cloud native AI systems
☆44Updated 2 weeks ago
SymbioticLab / Tiresias
Tiresias is a GPU cluster manager for distributed deep learning training.
☆156Updated 5 years ago
volcano-sh / devices
Device plugins for Volcano, e.g. GPU
☆126Updated 4 months ago
BaizeAI / kcover
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
☆31Updated last week
elastic-ai / elastic-gpu-scheduler
elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.
☆142Updated 2 years ago
volcano-sh / apis
The API (CRD) of Volcano
☆44Updated this week
tkestack / go-nvml
☆32Updated 4 years ago
NVIDIA / knavigator
knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.
☆69Updated 3 weeks ago
hkust-adsl / kubernetes-scheduler-simulator
Kubernetes Scheduler Simulator
☆114Updated last year
Bruce-Lee-LY / cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
☆165Updated last year
pkusys / TGS
Artifacts for our NSDI'23 paper TGS
☆81Updated last year
PaddleFlow / ElasticServing
Elastic Serverless Serving based on Kubernetes, provides 0 instance serving capability.
☆11Updated 3 years ago
llm-d / llm-d-inference-sim
A light weight vLLM simulator, for mocking out replicas.
☆31Updated this week
stanford-futuredata / gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
☆127Updated last year
CentaurusInfra / alnair
Intelligent platform for AI workloads
☆37Updated 2 years ago
alwqx / liang
A Kubernetes Scheduer Extender with two Customed Scheduling Algorithms BNP and CMDN.
☆35Updated 5 months ago