GoogleCloudPlatform / slurm-gcp
☆35Updated this week
Alternatives and similar repositories for slurm-gcp:
Users that are interested in slurm-gcp are comparing it to the libraries listed below
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆229Updated this week
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆103Updated this week
- Container plugin for Slurm Workload Manager☆320Updated 3 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆107Updated 3 weeks ago
- ☆14Updated this week
- Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.☆62Updated this week
- ☆134Updated 2 weeks ago
- A Slurm dashboard for the terminal.☆82Updated 11 months ago
- ☆40Updated 2 weeks ago
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆86Updated this week
- Slurm on Google Cloud Platform☆183Updated 5 months ago
- Carbon Limiting Auto Tuning for Kubernetes☆33Updated 3 months ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆24Updated this week
- Modular, scalable library to train ML models☆60Updated this week
- GPU Environment Management for Visual Studio Code☆37Updated last year
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆64Updated 2 months ago
- Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub☆17Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆252Updated 2 weeks ago
- A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deploymen…☆157Updated last week
- Deploy your HPC Cluster on AWS in 20min. with just 1-Click.☆63Updated 11 months ago
- SmartSim Infrastructure Library Clients.☆54Updated 3 months ago
- JAX-Toolbox☆280Updated this week
- OCI-compatible engine to deploy Linux containers on HPC environments.☆134Updated 3 months ago
- ☆22Updated this week
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆35Updated this week
- Serialize JAX, Flax, Haiku, or Objax model params with 🤗`safetensors`☆44Updated 8 months ago
- A top-like tool for monitoring GPUs in a cluster☆84Updated last year
- Collection of scripts to build PyTorch and the domain libraries from source.☆10Updated 2 weeks ago
- Orbax provides common checkpointing and persistence utilities for JAX users☆338Updated this week