NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
☆2,590Mar 16, 2026Updated this week
Alternatives and similar repositories for gpu-operator
Users that are interested in gpu-operator are comparing it to the libraries listed below
Sorting:
- NVIDIA device plugin for Kubernetes☆3,699Mar 13, 2026Updated last week
- NVIDIA GPU metrics exporter for Prometheus leveraging DCGM☆1,640Feb 25, 2026Updated 3 weeks ago
- GPU plugin to the node feature discovery for Kubernetes☆307May 27, 2024Updated last year
- GPU Sharing Scheduler for Kubernetes Cluster☆1,530Dec 29, 2023Updated 2 years ago
- A Cloud Native Batch System (Project under CNCF)☆5,381Mar 11, 2026Updated last week
- Node feature discovery for Kubernetes☆1,007Mar 13, 2026Updated last week
- NVIDIA DRA Driver for GPUs☆585Updated this week
- Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)☆3,091Mar 13, 2026Updated last week
- NVIDIA Network Operator☆326Updated this week
- Tools for monitoring NVIDIA GPUs on Linux☆1,068Nov 2, 2021Updated 4 years ago
- Kubernetes-native Job Queueing☆2,368Updated this week
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes☆5,216Updated this week
- A toolkit to run Ray applications on Kubernetes☆2,371Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆1,181Updated this week
- ☆338Mar 11, 2026Updated last week
- ☆893Apr 2, 2024Updated last year
- Distributed AI Model Training and LLM Fine-Tuning on Kubernetes☆2,056Updated this week
- Repository for out-of-tree scheduler plugins based on scheduler framework.☆1,281Updated this week
- Kubebuilder - SDK for building Kubernetes APIs using CRDs☆9,028Updated this week
- Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration☆5,330Mar 13, 2026Updated last week
- Build and run containers leveraging NVIDIA GPUs☆4,149Updated this week
- Go Bindings for the NVIDIA Management Library (NVML)☆426Feb 12, 2026Updated last month
- This is a place for various problem detectors running on the Kubernetes nodes.☆3,362Updated this week
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆516Mar 12, 2026Updated last week
- Kubernetes Virtualization API and runtime in order to define and manage virtual machines.☆6,714Updated this week
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆685Feb 17, 2026Updated last month
- Tools for building GPU clusters☆1,424Feb 23, 2026Updated 3 weeks ago
- GPU Sharing Device Plugin for Kubernetes Cluster☆492Jan 10, 2023Updated 3 years ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆677Mar 11, 2026Updated last week
- NVIDIA container runtime library☆1,078Mar 12, 2026Updated last week
- A CNI meta-plugin for multi-homed pods in Kubernetes☆2,810Mar 11, 2026Updated last week
- NVIDIA k8s device plugin for Kubevirt☆278Updated this week
- Autoscaling components for Kubernetes☆8,795Updated this week
- Backup and migrate Kubernetes applications and their persistent volumes☆9,883Mar 13, 2026Updated last week
- MIG Partition Editor for NVIDIA GPUs☆244Updated this week
- Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster☆375Jan 20, 2026Updated last month
- Machine Learning Toolkit for Kubernetes☆15,519Jan 5, 2026Updated 2 months ago
- Automated management of large-scale applications on Kubernetes (incubating project under CNCF)☆5,196Mar 11, 2026Updated last week
- Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes☆9,855Mar 12, 2026Updated last week