NVIDIA / cloud-native-stack
Run cloud native workloads on NVIDIA GPUs
☆164Updated this week
Alternatives and similar repositories for cloud-native-stack:
Users that are interested in cloud-native-stack are comparing it to the libraries listed below
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆480Updated last month
- MIG Partition Editor for NVIDIA GPUs☆190Updated this week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆89Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆88Updated this week
- NVIDIA Network Operator☆243Updated this week
- GPU plugin to the node feature discovery for Kubernetes☆298Updated 9 months ago
- markdown docs☆82Updated this week
- NVIDIA NCCL Tests for Distributed Training☆84Updated last week
- Tools to deploy GPU clusters in the Cloud☆31Updated last year
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆330Updated this week
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆99Updated this week
- ☆237Updated this week
- ☆330Updated 10 months ago
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆167Updated this week
- NVIDIA k8s device plugin for Kubevirt☆248Updated last week
- A Slurm cluster for Kubernetes☆55Updated 7 months ago
- Controller for ModelMesh☆225Updated 3 weeks ago
- ☆43Updated 6 months ago
- ☆23Updated last month
- ☆82Updated 3 months ago
- ☆42Updated 10 months ago
- Share GPU between Pods in Kubernetes☆210Updated 2 years ago
- Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions☆158Updated 5 years ago
- Magnum IO community repo☆85Updated 2 months ago
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster☆312Updated last week
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆469Updated last week
- ☆60Updated last week
- ☆248Updated this week
- CloudAI Benchmark Framework☆59Updated this week