GoogleCloudPlatform / slurm-gcp
☆38Updated this week
Alternatives and similar repositories for slurm-gcp:
Users that are interested in slurm-gcp are comparing it to the libraries listed below
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆237Updated this week
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆107Updated this week
- ☆43Updated last month
- ☆22Updated this week
- Testing if I can implement slurm in an operator☆14Updated 4 months ago
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆48Updated this week
- GPU Environment Management for Visual Studio Code☆37Updated last year
- ☆137Updated last week
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆64Updated 4 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆35Updated this week
- Collection of scripts to build PyTorch and the domain libraries from source.☆10Updated 2 weeks ago
- Deploy your HPC Cluster on AWS in 20min. with just 1-Click.☆64Updated last year
- ☆15Updated last week
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆40Updated this week
- Deploy a Flux MiniCluster to Kubernetes with the operator☆31Updated 3 weeks ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 6 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆55Updated this week
- Slurm on Google Cloud Platform☆183Updated 6 months ago
- A Slurm dashboard for the terminal.☆84Updated last year
- Container plugin for Slurm Workload Manager☆329Updated 4 months ago
- Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub☆17Updated last year
- A parallel framework for training deep neural networks☆57Updated 2 weeks ago
- ☆30Updated last week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆91Updated last week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆92Updated this week
- Tools to deploy GPU clusters in the Cloud☆31Updated last year
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆300Updated this week
- ☆21Updated 3 weeks ago
- ☆78Updated 3 months ago