GoogleCloudPlatform / slurm-gcpLinks
☆57Updated this week
Alternatives and similar repositories for slurm-gcp
Users that are interested in slurm-gcp are comparing it to the libraries listed below
Sorting:
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆297Updated this week
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆152Updated this week
- ☆42Updated 3 weeks ago
- ☆146Updated 2 weeks ago
- Repository of machine learning benchmarks☆45Updated last week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆113Updated this week
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- Container plugin for Slurm Workload Manager☆396Updated last week
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆33Updated 2 months ago
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆100Updated this week
- Run Slurm as a Kubernetes scheduler. A Slinky project.☆48Updated last week
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆103Updated this week
- MLPerf™ logging library☆37Updated last month
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆308Updated last week
- ☆61Updated 2 years ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆390Updated 5 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 2 months ago
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆400Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆273Updated 3 months ago
- Slurm on Google Cloud Platform☆188Updated last year
- Repository for open inference protocol specification☆59Updated 6 months ago
- Deploy your HPC Cluster on AWS in 20min. with just 1-Click.☆65Updated last year
- MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.☆157Updated last year
- ☆15Updated last month
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆65Updated 5 months ago
- torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…☆128Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆77Updated this week
- JAX-Toolbox☆363Updated this week
- ☆54Updated 2 weeks ago