NVIDIA/deepops

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/deepops)

NVIDIA / deepops

Tools for building GPU clusters

☆1,461

Alternatives and similar repositories for deepops

Users that are interested in deepops are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / pyxis
View on GitHub
Container plugin for Slurm Workload Manager
☆453May 12, 2026Updated 2 months ago
NVIDIA / enroot
View on GitHub
A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.
☆978Jun 9, 2026Updated last month
NVIDIA / gpu-operator
View on GitHub
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
☆2,792Updated this week
NVIDIA / k8s-device-plugin
View on GitHub
NVIDIA device plugin for Kubernetes
☆3,819Updated this week
NVIDIA / gpu-monitoring-tools
View on GitHub
Tools for monitoring NVIDIA GPUs on Linux
☆1,075Nov 2, 2021Updated 4 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
NVIDIA / DCGM
View on GitHub
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
☆762Jul 6, 2026Updated last week
rackslab / Slurm-web
View on GitHub
Open source web interface for Slurm HPC & AI clusters
☆589Jun 23, 2026Updated 3 weeks ago
vpenso / prometheus-slurm-exporter
View on GitHub
Prometheus exporter for performance metrics from Slurm.
☆286Jun 20, 2024Updated 2 years ago
dell / omnia
View on GitHub
An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.
☆295Updated this week
SchedMD / slurm
View on GitHub
Slurm: A Highly Scalable Workload Manager
☆4,157Updated this week
triton-inference-server / server
View on GitHub
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
☆10,851Updated this week
dholt / slurm-gpu
View on GitHub
Scheduling GPU cluster workloads with Slurm
☆78Nov 5, 2018Updated 7 years ago
sylabs / wlm-operator
View on GitHub
Singularity implementation of k8s operator for interacting with SLURM.
☆118Dec 29, 2020Updated 5 years ago
kubeflow / mpi-operator
View on GitHub
Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
☆530Updated this week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
lopentusska / slurm_ubuntu_gpu_cluster
View on GitHub
Instructions for setting up a Slurm gpu cluster on Ubuntu 22.04.
☆31Feb 29, 2024Updated 2 years ago
mej / nhc
View on GitHub
LBNL Node Health Check
☆284Apr 7, 2026Updated 3 months ago
OleHolmNielsen / Slurm_tools
View on GitHub
My tools for the Slurm HPC workload manager
☆583Updated this week
stackhpc / ansible-role-openhpc
View on GitHub
Ansible role for OpenHPC
☆51Jul 9, 2026Updated last week
NVIDIA / mig-parted
View on GitHub
MIG Partition Editor for NVIDIA GPUs
☆260Updated this week
AliyunContainerService / gpushare-scheduler-extender
View on GitHub
GPU Sharing Scheduler for Kubernetes Cluster
☆1,535Dec 29, 2023Updated 2 years ago
NVIDIA / hpc-container-maker
View on GitHub
HPC Container Maker
☆515May 29, 2026Updated last month
NVIDIA / aistore
View on GitHub
AIStore: scalable storage for AI applications
☆1,895Updated this week
NVIDIA / nephele
View on GitHub
Tools to deploy GPU clusters in the Cloud
☆34Apr 4, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
NVIDIA / gpu-feature-discovery
View on GitHub
GPU plugin to the node feature discovery for Kubernetes
☆309May 27, 2024Updated 2 years ago
NVIDIA / ngc-container-replicator
View on GitHub
NGC Container Replicator
☆29Dec 26, 2022Updated 3 years ago
Mellanox / ib-kubernetes
View on GitHub
☆78Updated this week
NVIDIA / cloud-native-stack
View on GitHub
Run cloud native workloads on NVIDIA GPUs
☆239Updated this week
horovod / horovod
View on GitHub
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
☆14,695Jun 20, 2026Updated 3 weeks ago
mknoxnv / ubuntu-slurm
View on GitHub
Steps to create a small slurm cluster with GPU enabled nodes
☆273Feb 2, 2023Updated 3 years ago
galaxyproject / ansible-slurm
View on GitHub
Ansible role for installing and managing the Slurm Workload Manager
☆120Nov 24, 2025Updated 7 months ago
kubeflow / kubeflow
View on GitHub
Machine Learning Toolkit for Kubernetes
☆15,781Jul 10, 2026Updated last week
NVIDIA / nvidia-container-runtime
View on GitHub
NVIDIA container runtime
☆1,127Oct 27, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kserve / kserve
View on GitHub
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
☆5,707Updated this week
microsoft / pai
View on GitHub
Resource scheduling and cluster management for AI
☆2,684Jun 6, 2024Updated 2 years ago
Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
NVIDIA / nvidia-docker
View on GitHub
Build and run Docker containers leveraging NVIDIA GPUs
☆17,581Dec 6, 2023Updated 2 years ago
stackhpc / ansible-slurm-appliance
View on GitHub
A Slurm-based HPC workload management environment, driven by Ansible.
☆72Updated this week
NVIDIA / libnvidia-container
View on GitHub
NVIDIA container runtime library
☆1,117Updated this week
microsoft / hivedscheduler
View on GitHub
Kubernetes Scheduler for Deep Learning
☆263May 22, 2022Updated 4 years ago