lopentusska / slurm_ubuntu_gpu_clusterLinks
Instructions for setting up a Slurm gpu cluster on Ubuntu 22.04.
☆27Updated last year
Alternatives and similar repositories for slurm_ubuntu_gpu_cluster
Users that are interested in slurm_ubuntu_gpu_cluster are comparing it to the libraries listed below
Sorting:
- Instructions for setting up a SLURM cluster using Ubuntu 18.04.3 with GPUs.☆152Updated 4 years ago
- A Slurm cluster using docker-compose☆384Updated 3 weeks ago
- Slurm in Docker - Exploring Slurm using CentOS 7 based Docker images☆129Updated 5 years ago
- A dummy's guide to setting up (and using) HPC clusters on Ubuntu 22.04LTS using Slurm and Munge. Created by the Quant Club @ UIowa.☆332Updated last year
- Ansible role for installing and managing the Slurm Workload Manager☆107Updated 4 months ago
- Container plugin for Slurm Workload Manager☆369Updated this week
- Super Computing On Web☆290Updated this week
- Open source web interface for Slurm HPC & AI clusters☆467Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆102Updated 2 weeks ago
- Jobstats is a job monitoring platform for CPU and GPU clusters☆81Updated 2 weeks ago
- A Slurm cluster for Kubernetes☆62Updated last year
- Slurm-Mail is a drop in replacement for Slurm's e-mails to give users much more information about their jobs compared to the standard Slu…☆111Updated this week
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆92Updated last month
- Tools for building GPU clusters☆1,374Updated last month
- Benchmark Suite for Deep Learning☆272Updated 5 months ago
- Steps to create a small slurm cluster with GPU enabled nodes☆270Updated 2 years ago
- My tools for the Slurm HPC workload manager☆527Updated last week
- ☆55Updated 8 months ago
- Determined AI public environments☆49Updated 11 months ago
- Build NCCL-Tests and configure SSHD in PyTorch container to help you test NCCL faster!☆11Updated last year
- A benchmark framework for Pytorch☆26Updated 4 months ago
- Prometheus exporter for performance metrics from Slurm.☆257Updated last year
- NCCL Tests☆1,209Updated 2 weeks ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆504Updated 3 months ago
- Optimized primitives for collective multi-GPU communication☆9Updated last year
- An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.☆262Updated this week
- NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs☆558Updated 3 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆128Updated last month
- ☆314Updated 11 months ago