neurokernel / gpu-cluster-config
How to Configure a GPU Cluster Running Ubuntu Linux
☆54Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for gpu-cluster-config
- Scheduling GPU cluster workloads with Slurm☆74Updated 6 years ago
- Steps to create a small slurm cluster with GPU enabled nodes☆263Updated last year
- Instructions for setting up a SLURM cluster using Ubuntu 18.04.3 with GPUs.☆137Updated 3 years ago
- PyProf2: PyTorch Profiling tool☆83Updated 4 years ago
- Tools to deploy GPU clusters in the Cloud☆30Updated last year
- This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.☆35Updated last year
- Container plugin for Slurm Workload Manager☆294Updated 2 weeks ago
- ☆32Updated 7 years ago
- Slurm in Docker - Exploring Slurm using CentOS 7 based Docker images☆120Updated 5 years ago
- Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.☆239Updated 2 years ago
- Custom Slurm tools☆23Updated 6 years ago
- SLURM Example Scripts☆69Updated 5 years ago
- gather and plot data about Slurm scheduling and job statistics☆50Updated 10 years ago
- Personal collection of references for high performance mixed precision training.☆41Updated 5 years ago
- PyTorch-MPI-DDP-example☆17Updated 6 years ago
- Ansible role for installing and managing the Slurm Workload Manager☆88Updated 7 months ago
- Tools and extensions for CUDA profiling☆63Updated 4 years ago
- Use TensorFlow efficiently☆95Updated 3 years ago
- nvidia-smi but for an entire GPU cluster☆76Updated 9 months ago
- Python bindings for NVTX☆66Updated last year
- This repository contains the results and code for the MLPerf™ Training v0.6 benchmark.☆42Updated last year
- ☆26Updated last year
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆56Updated last year
- Microway's improved version of GPU Burn☆86Updated 3 months ago
- Convert nvprof profiles into about:tracing compatible JSON files☆67Updated 3 years ago
- Slurm SPANK plugin to ease setup of SSH tunnels and port forwarding☆11Updated 8 months ago
- Deep Learning Benchmarking Suite☆130Updated last year
- HPC Container Maker☆457Updated 3 weeks ago
- Reference implementations of MLPerf™ HPC training benchmarks☆42Updated 5 months ago
- Template for Deploying Distributed TensorFlow on Clusters Using MPI☆15Updated 5 years ago