stackhpc / slurm-k8s-clusterLinks
A Slurm cluster for Kubernetes
☆61Updated 11 months ago
Alternatives and similar repositories for slurm-k8s-cluster
Users that are interested in slurm-k8s-cluster are comparing it to the libraries listed below
Sorting:
- Run Slurm on Kubernetes. A Slinky project.☆124Updated 2 weeks ago
- MIG Partition Editor for NVIDIA GPUs☆202Updated this week
- NVIDIA Network Operator☆262Updated this week
- ☆253Updated 2 weeks ago
- NVIDIA k8s device plugin for Kubevirt☆256Updated last week
- GPU plugin to the node feature discovery for Kubernetes☆301Updated last year
- Run Slurm in Kubernetes☆246Updated this week
- NVIDIA DRA Driver for GPUs☆392Updated this week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆241Updated last week
- A Lustre container storage interface that allows Kubernetes to mount/unmount provisioned Lustre filesystems into containers.☆34Updated 2 months ago
- ☆124Updated this week
- ☆62Updated last week
- Slurm in Kubernetes☆42Updated 7 months ago
- ☆26Updated 2 weeks ago
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆487Updated last month
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆115Updated this week
- Holistic job manager on Kubernetes☆116Updated last year
- K8s device plugin for GPU sharing☆98Updated 2 years ago
- Singularity implementation of k8s operator for interacting with SLURM.☆117Updated 4 years ago
- Run cloud native workloads on NVIDIA GPUs☆186Updated 2 months ago
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆115Updated last week
- RDMA CNI plugin for containerized workloads☆55Updated 2 weeks ago
- InstaSlice Operator facilitates slicing of accelerators using stable APIs☆41Updated this week
- Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster☆333Updated 2 weeks ago
- ☆276Updated last week
- Device plugins for Volcano, e.g. GPU☆125Updated 3 months ago
- ☆159Updated 3 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆97Updated 2 weeks ago
- This repo includes everything you need to know about deploying GPU nodes on OCI☆32Updated this week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆102Updated last week