oracle-quickstart / oci-hpc-oke
This repo includes everything you need to know about deploying GPU nodes on OCI
☆19Updated this week
Related projects ⓘ
Alternatives and complementary repositories for oci-hpc-oke
- NVIDIA NCCL Tests for Distributed Training☆70Updated 2 weeks ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆58Updated this week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆152Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆142Updated this week
- ☆57Updated 2 months ago
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆30Updated this week
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆75Updated this week
- MIG Partition Editor for NVIDIA GPUs☆174Updated this week
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆108Updated 4 months ago
- RDMA CNI plugin for containerized workloads☆41Updated 2 months ago
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆273Updated last week
- GPU plugin to the node feature discovery for Kubernetes☆292Updated 5 months ago
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆54Updated 2 weeks ago
- Example DRA driver that developers can fork and modify to get them started writing their own.☆53Updated 2 weeks ago
- ☆83Updated 2 months ago
- Holistic job manager on Kubernetes☆108Updated 9 months ago
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆95Updated 2 months ago
- LLM Instance gateway implementation.☆81Updated this week
- Enabling Kubernetes to make pod placement decisions with platform intelligence.☆171Updated 5 months ago
- ☆25Updated 2 months ago
- A collection of community maintained NRI plugins☆66Updated this week
- ☆199Updated 3 weeks ago
- The kernel module management operator builds, signs and loads kernel modules in Kubernetes clusters.