lopentusska / slurm_ubuntu_gpu_clusterLinks
Instructions for setting up a Slurm gpu cluster on Ubuntu 22.04.
☆30Updated last year
Alternatives and similar repositories for slurm_ubuntu_gpu_cluster
Users that are interested in slurm_ubuntu_gpu_cluster are comparing it to the libraries listed below
Sorting:
- Instructions for setting up a SLURM cluster using Ubuntu 18.04.3 with GPUs.☆153Updated last week
- Container plugin for Slurm Workload Manager☆396Updated last week
- Slurm in Docker - Exploring Slurm using CentOS 7 based Docker images☆129Updated 6 years ago
- Jobstats is a job monitoring platform for CPU and GPU clusters☆109Updated 3 weeks ago
- A Slurm cluster using docker-compose☆410Updated this week
- A dummy's guide to setting up (and using) HPC clusters on Ubuntu 22.04LTS using Slurm and Munge. Created by the Quant Club @ UIowa.☆378Updated last year
- Ansible role for installing and managing the Slurm Workload Manager☆111Updated 7 months ago
- My tools for the Slurm HPC workload manager☆554Updated last month
- Prometheus exporter for performance metrics from Slurm.☆269Updated last year
- Open source web interface for Slurm HPC & AI clusters☆508Updated this week
- Super Computing On Web☆303Updated last week
- Steps to create a small slurm cluster with GPU enabled nodes☆271Updated 2 years ago
- Export select slurm metrics to prometheus☆61Updated 2 months ago
- Supercomputing. Seamlessly. Open, Interactive HPC Via the Web☆395Updated this week
- NVIDIA NCCL Tests for Distributed Training☆123Updated last week
- Slurm-Mail is a drop in replacement for Slurm's e-mails to give users much more information about their jobs compared to the standard Slu…☆113Updated last week
- A Slurm cluster for Kubernetes☆65Updated last year
- Build NCCL-Tests and configure SSHD in PyTorch container to help you test NCCL faster!☆13Updated 2 months ago
- Environment modules for NGC containers☆29Updated 4 years ago
- Tutorial for installing Open XDMoD, OnDemand, & ColdFront☆160Updated 5 months ago
- Resources for Large Language Model Inference☆16Updated last year
- Cluster/HPC installation for diskless compute nodes☆46Updated 5 months ago
- A distributed scheduling system for HPC and AI workloads☆123Updated this week
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆15Updated 11 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆568Updated 7 months ago
- An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.☆281Updated this week
- Python Interface to Slurm☆551Updated last week
- DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and soft…☆46Updated last week
- Determined AI public environments☆49Updated last year
- ☆316Updated last year