neurokernel / gpu-cluster-config
How to Configure a GPU Cluster Running Ubuntu Linux
☆55Updated 8 years ago
Alternatives and similar repositories for gpu-cluster-config:
Users that are interested in gpu-cluster-config are comparing it to the libraries listed below
- Scheduling GPU cluster workloads with Slurm☆74Updated 6 years ago
- Steps to create a small slurm cluster with GPU enabled nodes☆267Updated last year
- Instructions for setting up a SLURM cluster using Ubuntu 18.04.3 with GPUs.☆143Updated 4 years ago
- Tools to deploy GPU clusters in the Cloud☆30Updated last year
- This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.☆35Updated last year
- Container plugin for Slurm Workload Manager☆314Updated 2 months ago
- Personal collection of references for high performance mixed precision training.☆41Updated 5 years ago
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆56Updated last year
- Bugfixing fork of Python bindings for the NVIDIA GPU Management Library☆51Updated 7 years ago
- Reference implementations of MLPerf™ HPC training benchmarks☆45Updated 8 months ago
- This repository contains the results and code for the MLPerf™ Training v0.6 benchmark.☆42Updated last year
- files and instructions for creating and using example containers from the sylabs.io blog☆104Updated last year
- oneCCL Bindings for Pytorch*☆87Updated 3 weeks ago
- PyTorch-MPI-DDP-example☆17Updated 6 years ago
- gather and plot data about Slurm scheduling and job statistics☆51Updated 10 years ago
- ☆97Updated 4 months ago
- Python bindings for NVTX☆66Updated last year
- Monitor your GPUs whether they are on a single computer or in a cluster☆161Updated 5 years ago
- Slurm in Docker - Exploring Slurm using CentOS 7 based Docker images☆126Updated 5 years ago
- Microway's improved version of GPU Burn☆88Updated 5 months ago
- Code examples for CUDA and OpenACC☆34Updated 5 months ago
- HPC Container Maker☆464Updated 2 weeks ago
- Custom Slurm tools☆24Updated 6 years ago
- PyProf2: PyTorch Profiling tool☆82Updated 4 years ago
- My tools for the Slurm HPC workload manager☆473Updated this week
- MPI Testing Tool☆64Updated last month
- Prometheus exporter for slurm job/node data☆33Updated 5 months ago
- SLURM Example Scripts☆70Updated 5 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆296Updated 6 years ago
- Distributed Learning by Pair-Wise Averaging☆53Updated 7 years ago