lopentusska / slurm_ubuntu_gpu_clusterLinks

Instructions for setting up a Slurm gpu cluster on Ubuntu 22.04.

☆30

Alternatives and similar repositories for slurm_ubuntu_gpu_cluster

Users that are interested in slurm_ubuntu_gpu_cluster are comparing it to the libraries listed below

Sorting:

nateGeorge / slurm_gpu_ubuntu
Instructions for setting up a SLURM cluster using Ubuntu 18.04.3 with GPUs.
☆153Updated last week
NVIDIA / pyxis
Container plugin for Slurm Workload Manager
☆396Updated last week
SciDAS / slurm-in-docker
Slurm in Docker - Exploring Slurm using CentOS 7 based Docker images
☆129Updated 6 years ago
PrincetonUniversity / jobstats
Jobstats is a job monitoring platform for CPU and GPU clusters
☆109Updated 3 weeks ago
giovtorres / slurm-docker-cluster
A Slurm cluster using docker-compose
☆410Updated this week
SergioMEV / slurm-for-dummies
A dummy's guide to setting up (and using) HPC clusters on Ubuntu 22.04LTS using Slurm and Munge. Created by the Quant Club @ UIowa.
☆378Updated last year
galaxyproject / ansible-slurm
Ansible role for installing and managing the Slurm Workload Manager
☆111Updated 7 months ago
OleHolmNielsen / Slurm_tools
My tools for the Slurm HPC workload manager
☆554Updated last month
vpenso / prometheus-slurm-exporter
Prometheus exporter for performance metrics from Slurm.
☆269Updated last year
rackslab / Slurm-web
Open source web interface for Slurm HPC & AI clusters
☆508Updated this week
PKUHPC / OpenSCOW
Super Computing On Web
☆303Updated last week
mknoxnv / ubuntu-slurm
Steps to create a small slurm cluster with GPU enabled nodes
☆271Updated 2 years ago
rivosinc / prometheus-slurm-exporter
Export select slurm metrics to prometheus
☆61Updated 2 months ago
OSC / ondemand
Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
☆395Updated this week
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆123Updated last week
neilmunday / slurm-mail
Slurm-Mail is a drop in replacement for Slurm's e-mails to give users much more information about their jobs compared to the standard Slu…
☆113Updated last week
stackhpc / slurm-k8s-cluster
A Slurm cluster for Kubernetes
☆65Updated last year
mayooot / build-nccl-tests-with-pytorch
Build NCCL-Tests and configure SSHD in PyTorch container to help you test NCCL faster!
☆13Updated 2 months ago
NVIDIA / ngc-container-environment-modules
Environment modules for NGC containers
☆29Updated 4 years ago
ubccr / hpc-toolset-tutorial
Tutorial for installing Open XDMoD, OnDemand, & ColdFront
☆160Updated 5 months ago
godaai / llm-inference
Resources for Large Language Model Inference
☆16Updated last year
manbaritone / OpenHPC-Installation
Cluster/HPC installation for diskless compute nodes
☆46Updated 5 months ago
PKUHPC / CraneSched
A distributed scheduling system for HPC and AI workloads
☆123Updated this week
HabanaAI / Megatron-DeepSpeed
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆15Updated 11 months ago
NVIDIA / nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
☆568Updated 7 months ago
dell / omnia
An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.
☆281Updated this week
PySlurm / pyslurm
Python Interface to Slurm
☆551Updated last week
NVIDIA / dgxc-benchmarking
DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and soft…
☆46Updated last week
determined-ai / environments
Determined AI public environments
☆49Updated last year
imbue-ai / cluster-health
☆316Updated last year