Tools for building GPU clusters
☆1,430Feb 23, 2026Updated last month
Alternatives and similar repositories for deepops
Users that are interested in deepops are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Container plugin for Slurm Workload Manager☆426Mar 23, 2026Updated 2 weeks ago
- A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.☆924Mar 23, 2026Updated 2 weeks ago
- NVIDIA device plugin for Kubernetes☆3,716Apr 2, 2026Updated last week
- NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes☆2,614Updated this week
- Tools for monitoring NVIDIA GPUs on Linux☆1,070Nov 2, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆701Mar 30, 2026Updated last week
- Open source web interface for Slurm HPC & AI clusters☆555Updated this week
- Prometheus exporter for performance metrics from Slurm.☆278Jun 20, 2024Updated last year
- An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.☆291Updated this week
- Slurm: A Highly Scalable Workload Manager☆3,851Updated this week
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.☆10,507Apr 2, 2026Updated last week
- Scheduling GPU cluster workloads with Slurm☆78Nov 5, 2018Updated 7 years ago
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆519Mar 23, 2026Updated 2 weeks ago
- Singularity implementation of k8s operator for interacting with SLURM.☆117Dec 29, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- My tools for the Slurm HPC workload manager☆573Mar 30, 2026Updated last week
- LBNL Node Health Check☆276Apr 18, 2025Updated 11 months ago
- Ansible role for OpenHPC☆51Mar 2, 2026Updated last month
- MIG Partition Editor for NVIDIA GPUs☆245Updated this week
- AIStore: scalable storage for AI applications☆1,806Updated this week
- GPU Sharing Scheduler for Kubernetes Cluster☆1,532Dec 29, 2023Updated 2 years ago
- HPC Container Maker☆512Mar 13, 2026Updated 3 weeks ago
- Tools to deploy GPU clusters in the Cloud☆34Apr 4, 2023Updated 3 years ago
- GPU plugin to the node feature discovery for Kubernetes☆307May 27, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- NGC Container Replicator☆28Dec 26, 2022Updated 3 years ago
- Instructions for setting up a Slurm gpu cluster on Ubuntu 22.04.☆31Feb 29, 2024Updated 2 years ago
- ☆75Apr 2, 2026Updated last week
- Run cloud native workloads on NVIDIA GPUs☆231Jan 22, 2026Updated 2 months ago
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,686Dec 1, 2025Updated 4 months ago
- Steps to create a small slurm cluster with GPU enabled nodes☆272Feb 2, 2023Updated 3 years ago
- Machine Learning Toolkit for Kubernetes☆15,552Jan 5, 2026Updated 3 months ago
- Ansible role for installing and managing the Slurm Workload Manager☆116Nov 24, 2025Updated 4 months ago
- RDMA and SHARP plugins for nccl library☆225Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- NVIDIA container runtime☆1,124Oct 27, 2023Updated 2 years ago
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes☆5,305Updated this week
- Build and run Docker containers leveraging NVIDIA GPUs☆17,526Dec 6, 2023Updated 2 years ago
- Resource scheduling and cluster management for AI☆2,683Jun 6, 2024Updated last year
- A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…☆5,658Apr 2, 2026Updated last week
- NVIDIA container runtime library☆1,089Mar 30, 2026Updated last week
- A Slurm-based HPC workload management environment, driven by Ansible.☆67Apr 1, 2026Updated last week