NVIDIA / deepopsLinks
Tools for building GPU clusters
☆1,365Updated 2 months ago
Alternatives and similar repositories for deepops
Users that are interested in deepops are comparing it to the libraries listed below
Sorting:
- A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.☆777Updated 6 months ago
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆532Updated last month
- Container plugin for Slurm Workload Manager☆347Updated 7 months ago
- NVIDIA device plugin for Kubernetes☆3,281Updated this week
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆483Updated last month
- Run cloud native workloads on NVIDIA GPUs☆180Updated last month
- AIStore: scalable storage for AI applications☆1,535Updated this week
- Tools for monitoring NVIDIA GPUs on Linux☆1,041Updated 3 years ago
- MIG Partition Editor for NVIDIA GPUs☆201Updated last week
- NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes☆2,175Updated this week
- Open source web interface for Slurm HPC & AI clusters☆441Updated last week
- GPU environment and cluster management with LLM support☆611Updated last year
- Prometheus exporter for performance metrics from Slurm.☆253Updated last year
- NVIDIA GPU metrics exporter for Prometheus leveraging DCGM☆1,243Updated 3 weeks ago
- GPU plugin to the node feature discovery for Kubernetes☆300Updated last year
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆377Updated last week
- PyTorch on Kubernetes☆309Updated 3 years ago
- GPU Sharing Scheduler for Kubernetes Cluster☆1,477Updated last year
- NVIDIA container runtime library☆971Updated 2 weeks ago
- Share GPU between Pods in Kubernetes☆209Updated 2 years ago
- HPC Container Maker☆482Updated 3 months ago
- Steps to create a small slurm cluster with GPU enabled nodes☆270Updated 2 years ago
- NVIDIA container runtime☆1,117Updated last year
- An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.☆259Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆656Updated last week
- Slurm on Google Cloud Platform☆188Updated 9 months ago
- A JupyterLab extension for displaying dashboards of GPU usage.☆649Updated last week
- Reference implementations of MLPerf™ training benchmarks☆1,684Updated this week
- Kubeflow Deployment Manifests☆922Updated this week
- A Slurm cluster using docker-compose☆377Updated 8 months ago