oracle-quickstart / oci-hpc-okeLinks
This repo includes everything you need to know about deploying GPU nodes on OCI
☆39Updated last week
Alternatives and similar repositories for oci-hpc-oke
Users that are interested in oci-hpc-oke are comparing it to the libraries listed below
Sorting:
- MIG Partition Editor for NVIDIA GPUs☆222Updated last week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆131Updated this week
- NVIDIA NCCL Tests for Distributed Training☆121Updated last week
- A toolkit for discovering cluster network topology.☆76Updated this week
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆85Updated last week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆276Updated last week
- NVIDIA DRA Driver for GPUs☆471Updated last week
- CUDA checkpoint and restore utility☆381Updated last month
- GenAI inference performance benchmarking tool☆110Updated this week
- Run Slurm in Kubernetes☆311Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆182Updated this week
- Inference scheduler for llm-d☆102Updated this week
- Run cloud native workloads on NVIDIA GPUs☆204Updated last month
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆138Updated last week
- A Slurm cluster for Kubernetes☆65Updated last year
- Gateway API Inference Extension☆514Updated this week
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆301Updated last week
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆138Updated 2 weeks ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆604Updated this week
- ☆264Updated this week
- llm-d helm charts and deployment examples☆45Updated last month
- ☆179Updated 3 weeks ago
- Cloud Native Benchmarking of Foundation Models☆44Updated 3 months ago
- WG Serving☆31Updated 3 weeks ago
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆881Updated last week
- Distributed KV cache coordinator☆82Updated last week
- Container plugin for Slurm Workload Manager☆389Updated last month
- A tool to detect infrastructure issues on cloud native AI systems☆49Updated last month
- K8s device plugin for GPU sharing☆99Updated 2 years ago
- Health checks for Azure N- and H-series VMs.☆54Updated last month