NVIDIA / gpu-operatorLinks
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
☆2,139Updated this week
Alternatives and similar repositories for gpu-operator
Users that are interested in gpu-operator are comparing it to the libraries listed below
Sorting:
- NVIDIA device plugin for Kubernetes☆3,238Updated this week
- NVIDIA GPU metrics exporter for Prometheus leveraging DCGM☆1,209Updated this week
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆363Updated this week
- GPU Sharing Scheduler for Kubernetes Cluster☆1,472Updated last year
- GPU plugin to the node feature discovery for Kubernetes☆300Updated last year
- Kubernetes-native Job Queueing☆1,798Updated this week
- Tools for monitoring NVIDIA GPUs on Linux☆1,040Updated 3 years ago
- Node feature discovery for Kubernetes☆884Updated this week
- Kubeflow Deployment Manifests☆913Updated this week
- Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster☆327Updated this week
- Distributed ML Training and Fine-Tuning on Kubernetes☆1,792Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆590Updated this week
- Repository for out-of-tree scheduler plugins based on scheduler framework.☆1,203Updated last week
- A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC☆1,091Updated 2 years ago
- Dynamically provisioning persistent local storage with Kubernetes☆2,489Updated last month
- This driver allows Kubernetes to access NFS server on Linux node.☆1,033Updated last week
- Simple Kubernetes Operator for MinIO clusters☆1,297Updated this week
- GPU Sharing Device Plugin for Kubernetes Cluster☆481Updated 2 years ago
- ☆875Updated last year
- Kubernetes Control Plane Virtual IP and Load-Balancer☆2,411Updated this week
- Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)☆2,721Updated this week
- Tools for building GPU clusters☆1,356Updated last month
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆480Updated 3 weeks ago
- CSI driver for Ceph☆1,394Updated this week
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆515Updated last month
- A Cloud Native Batch System (Project under CNCF)☆4,696Updated this week
- An implementation of the custom.metrics.k8s.io API using Prometheus☆1,994Updated last month
- Repository for the next iteration of composite service (e.g. Ingress) and load balancing APIs.☆2,094Updated this week
- Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)☆1,704Updated this week
- Static provisioner of local volumes☆1,133Updated last month