NVIDIA / deepopsLinks
Tools for building GPU clusters
☆1,400Updated 4 months ago
Alternatives and similar repositories for deepops
Users that are interested in deepops are comparing it to the libraries listed below
Sorting:
- Tools for monitoring NVIDIA GPUs on Linux☆1,057Updated 4 years ago
 - Container plugin for Slurm Workload Manager☆389Updated last month
 - MIG Partition Editor for NVIDIA GPUs☆222Updated this week
 - Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆499Updated 3 weeks ago
 - NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆608Updated 2 weeks ago
 - A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.☆843Updated 2 weeks ago
 - NVIDIA device plugin for Kubernetes☆3,501Updated this week
 - Run cloud native workloads on NVIDIA GPUs☆202Updated 3 weeks ago
 - GPU plugin to the node feature discovery for Kubernetes☆305Updated last year
 - NVIDIA container runtime☆1,123Updated 2 years ago
 - NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes☆2,371Updated this week
 - Kubeflow Deployment Manifests☆963Updated this week
 - NVIDIA GPU metrics exporter for Prometheus leveraging DCGM☆1,448Updated last week
 - Fork of NVIDIA device plugin for Kubernetes with support for shared GPUs by declaring GPUs multiple times☆89Updated 3 years ago
 - NVIDIA container runtime library☆1,031Updated last week
 - Tools to deploy GPU clusters in the Cloud☆33Updated 2 years ago
 - PyTorch on Kubernetes☆309Updated 3 years ago
 - Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆495Updated this week
 - GPU environment and cluster management with LLM support☆652Updated last year
 - AIStore: scalable storage for AI applications☆1,616Updated this week
 - NCCL Tests☆1,313Updated this week
 - An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.☆279Updated this week
 - Automated Machine Learning on Kubernetes☆1,634Updated 2 weeks ago
 - markdown docs☆94Updated this week
 - HPC Container Maker☆496Updated last week
 - Benchmark Suite for Deep Learning☆278Updated 3 weeks ago
 - Distributed AI Model Training and Fine-Tuning on Kubernetes☆1,953Updated this week
 - Open source web interface for Slurm HPC & AI clusters☆504Updated this week
 - GPU Sharing Scheduler for Kubernetes Cluster☆1,512Updated last year
 - Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster☆353Updated last week