GoogleCloudPlatform / slurm-gcpLinks
☆60Updated this week
Alternatives and similar repositories for slurm-gcp
Users that are interested in slurm-gcp are comparing it to the libraries listed below
Sorting:
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆310Updated this week
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆162Updated last week
- ☆151Updated 3 weeks ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆124Updated this week
- ☆44Updated last week
- ☆16Updated 3 months ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆35Updated 4 months ago
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆104Updated this week
- A top-like tool for monitoring GPUs in a cluster☆84Updated last year
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆112Updated this week
- ☆72Updated last week
- Repository of machine learning benchmarks☆49Updated 2 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Updated 4 months ago
- Run Slurm as a Kubernetes scheduler. A Slinky project.☆61Updated last week
- Slurm on Google Cloud Platform☆190Updated last year
- Container plugin for Slurm Workload Manager☆412Updated 3 weeks ago
- Deploy your HPC Cluster on AWS in 20min. with just 1-Click.☆67Updated last year
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆57Updated this week
- Repository for open inference protocol specification☆63Updated 8 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆403Updated 3 weeks ago
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆64Updated 3 weeks ago
- MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.☆158Updated 2 months ago
- LM engine is a library for pretraining/finetuning LLMs☆113Updated this week
- ☆48Updated 3 weeks ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- MLPerf™ logging library☆38Updated last month
- Cloud Native Benchmarking of Foundation Models☆44Updated 6 months ago
- General policies for MLPerf® benchmarks including submission rules, coding standards, etc.☆31Updated last week
- ☆69Updated this week
- ☆15Updated 4 years ago