oracle-quickstart / oci-hpc-oke
This repo includes everything you need to know about deploying GPU nodes on OCI
☆25Updated this week
Alternatives and similar repositories for oci-hpc-oke:
Users that are interested in oci-hpc-oke are comparing it to the libraries listed below
- NVIDIA NCCL Tests for Distributed Training☆85Updated last week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆88Updated this week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆194Updated this week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆64Updated last week
- Gateway API Inference Extension☆183Updated this week
- ☆94Updated 2 months ago
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆330Updated this week
- MIG Partition Editor for NVIDIA GPUs☆191Updated this week
- ☆24Updated last month
- Example DRA driver that developers can fork and modify to get them started writing their own.☆63Updated this week
- A Slurm cluster for Kubernetes☆55Updated 7 months ago
- ☆60Updated last week
- RDMA CNI plugin for containerized workloads☆51Updated last week
- K8s device plugin for GPU sharing☆100Updated last year
- A toolkit for discovering cluster network topology.☆39Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆339Updated this week
- This project provides a framework that runs Slurm in Kubernetes.☆65Updated this week
- A federation scheduler for multi-cluster☆33Updated last month
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆113Updated 8 months ago
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆104Updated 2 weeks ago
- CUDA checkpoint and restore utility☆310Updated last month
- ☆38Updated this week
- elastic-gpu-agent is a Kubernetes device plugin for GPU resources allocation on node.☆54Updated 2 years ago
- ☆116Updated last week
- Holistic job manager on Kubernetes☆112Updated last year
- GPU plugin to the node feature discovery for Kubernetes☆298Updated 9 months ago
- IP Over Infiniband (IPoIB) CNI Plugin☆12Updated last week
- A collection of community maintained NRI plugins☆75Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆220Updated last month
- Enabling Kubernetes to make pod placement decisions with platform intelligence.☆174Updated last month