nebius / soperator
Run Slurm in Kubernetes
☆205Updated this week
Alternatives and similar repositories for soperator:
Users that are interested in soperator are comparing it to the libraries listed below
- This project provides a framework that runs Slurm in Kubernetes.☆75Updated 2 weeks ago
- ☆37Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆483Updated this week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆218Updated last week
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆346Updated this week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆93Updated this week
- A Slurm cluster for Kubernetes☆55Updated 8 months ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆92Updated this week
- Slurm in Kubernetes☆41Updated 4 months ago
- MIG Partition Editor for NVIDIA GPUs☆194Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆402Updated this week
- CUDA checkpoint and restore utility☆325Updated 2 months ago
- ☆126Updated this week
- InstaSlice Operator facilitates slicing of accelerators using stable APIs☆33Updated this week
- ☆105Updated 3 weeks ago
- A toolkit for discovering cluster network topology.☆45Updated 2 weeks ago
- Holistic job manager on Kubernetes☆115Updated last year
- K8s device plugin for GPU sharing☆100Updated last year
- GPU plugin to the node feature discovery for Kubernetes☆300Updated 10 months ago
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆474Updated this week
- NVIDIA NCCL Tests for Distributed Training☆88Updated last week
- GenAI inference performance benchmarking tool☆36Updated 3 weeks ago
- Module, Model, and Tensor Serialization/Deserialization☆223Updated 2 months ago
- This repo includes everything you need to know about deploying GPU nodes on OCI☆26Updated this week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆65Updated this week
- Gateway API Inference Extension☆229Updated this week
- User documentation for KServe.☆106Updated this week
- Run cloud native workloads on NVIDIA GPUs☆168Updated last week
- ☆24Updated 3 weeks ago
- elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.☆140Updated 2 years ago