NVIDIA / k8s-nim-operator
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆87Updated this week
Alternatives and similar repositories for k8s-nim-operator:
Users that are interested in k8s-nim-operator are comparing it to the libraries listed below
- Example DRA driver that developers can fork and modify to get them started writing their own.☆63Updated this week
- InstaSlice facilitates the use of Dynamic Resource Allocation (DRA) on Kubernetes clusters for GPU sharing☆27Updated 3 months ago
- JobSet: a k8s native API for distributed ML training and HPC workloads☆194Updated this week
- InstaSlice Operator facilitates slicing of accelerators using stable APIs☆29Updated this week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆89Updated last week
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆327Updated this week
- ☆34Updated this week
- Containerization and cloud native suite for OPEA☆43Updated this week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆62Updated last month
- ☆50Updated last year
- ☆107Updated this week
- Holistic job manager on Kubernetes☆112Updated last year
- Gateway API Inference Extension☆176Updated this week
- MIG Partition Editor for NVIDIA GPUs☆189Updated this week
- K8s device plugin for GPU sharing☆100Updated last year
- ☆112Updated this week
- GenAI inference performance benchmarking tool☆19Updated this week
- This repo includes everything you need to know about deploying GPU nodes on OCI☆25Updated last week
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆94Updated this week
- Controller for ModelMesh☆224Updated 2 weeks ago
- The kernel module management operator builds, signs and loads kernel modules in Kubernetes clusters.☆96Updated this week
- ☆19Updated last week
- Run cloud native workloads on NVIDIA GPUs☆162Updated 2 weeks ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆324Updated this week
- ☆85Updated 6 months ago
- Repository for open inference protocol specification☆48Updated 7 months ago
- This project provides a framework that runs Slurm in Kubernetes.☆64Updated last week
- Enabling Kubernetes to make pod placement decisions with platform intelligence.☆174Updated last month
- A Topology-Aware Custom Scheduler For Kubernetes☆63Updated last year
- ☆23Updated 3 weeks ago