SlinkyProject / slurm-bridgeLinks
Run Slurm as a Kubernetes scheduler. A Slinky project.
☆41Updated this week
Alternatives and similar repositories for slurm-bridge
Users that are interested in slurm-bridge are comparing it to the libraries listed below
Sorting:
- Run Slurm on Kubernetes. A Slinky project.☆169Updated last week
- A Slurm cluster for Kubernetes☆63Updated last year
- MIG Partition Editor for NVIDIA GPUs☆215Updated this week
- ☆264Updated 3 weeks ago
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆134Updated last week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆129Updated this week
- Singularity implementation of k8s operator for interacting with SLURM.☆117Updated 4 years ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆111Updated last week
- OCI-compatible engine to deploy Linux containers on HPC environments.☆138Updated 11 months ago
- NVIDIA Network Operator☆283Updated last week
- Holistic job manager on Kubernetes☆116Updated last year
- Slurm in Kubernetes☆43Updated 3 weeks ago
- KJob: Tool for CLI-loving ML researchers☆39Updated last week
- The Singularity implementation of the Kubernetes Container Runtime Interface☆114Updated 4 years ago
- NVIDIA NCCL Tests for Distributed Training☆111Updated last week
- A Lustre container storage interface that allows Kubernetes to mount/unmount provisioned Lustre filesystems into containers.☆37Updated this week
- GenAI inference performance benchmarking tool☆97Updated last week
- Container plugin for Slurm Workload Manager☆382Updated this week
- A toolkit for discovering cluster network topology.☆70Updated this week
- Helm charts for llm-d☆50Updated 2 months ago
- ☆65Updated 2 weeks ago
- JobSet: a k8s native API for distributed ML training and HPC workloads☆262Updated this week
- ☆26Updated last month
- GPU plugin to the node feature discovery for Kubernetes☆305Updated last year
- Run Slurm in Kubernetes☆287Updated this week
- Cloud Native Benchmarking of Foundation Models☆42Updated 2 months ago
- An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.☆275Updated this week
- CUDA checkpoint and restore utility☆371Updated 2 weeks ago
- ☆26Updated last month
- This repo includes everything you need to know about deploying GPU nodes on OCI☆35Updated this week