NVIDIA / ais-k8s
Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.
☆93Updated this week
Alternatives and similar repositories for ais-k8s:
Users that are interested in ais-k8s are comparing it to the libraries listed below
- This project provides a framework that runs Slurm in Kubernetes.☆75Updated 2 weeks ago
- Run cloud native workloads on NVIDIA GPUs☆168Updated last week
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆92Updated this week
- Holistic job manager on Kubernetes☆115Updated last year
- Repository for open inference protocol specification☆53Updated 8 months ago
- A Slurm cluster for Kubernetes☆55Updated 8 months ago
- MIG Partition Editor for NVIDIA GPUs☆194Updated this week
- InstaSlice facilitates the use of Dynamic Resource Allocation (DRA) on Kubernetes clusters for GPU sharing☆27Updated 4 months ago
- GPU plugin to the node feature discovery for Kubernetes☆300Updated 10 months ago
- K8s device plugin for GPU sharing☆100Updated last year
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆106Updated this week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆218Updated this week
- Run Slurm in Kubernetes☆205Updated this week
- ☆24Updated 3 weeks ago
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆35Updated 3 weeks ago
- Enabling Kubernetes to make pod placement decisions with platform intelligence.☆174Updated 2 months ago
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆346Updated this week
- markdown docs☆86Updated this week
- A toolkit for discovering cluster network topology.☆45Updated 2 weeks ago
- Custom Scheduler to deploy ML models to TRTIS for GPU Sharing☆12Updated 5 years ago
- Module, Model, and Tensor Serialization/Deserialization☆223Updated 2 months ago
- Provides for deploying custom ETL containers on AIStore, with subsequent user-defined extraction-transformation-loading in parallel, on t…☆16Updated this week
- Unified runtime-adapter image of the sidecar containers which run in the modelmesh pods☆21Updated last month
- A Topology-Aware Custom Scheduler For Kubernetes☆63Updated last year
- CUDA checkpoint and restore utility☆325Updated 2 months ago
- elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.☆140Updated 2 years ago
- GenAI inference performance benchmarking tool☆36Updated 2 weeks ago
- ☆105Updated 3 weeks ago
- ☆62Updated this week