GoogleCloudPlatform / nvidia-nemo-on-gke
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
☆12Updated 3 months ago
Alternatives and similar repositories for nvidia-nemo-on-gke:
Users that are interested in nvidia-nemo-on-gke are comparing it to the libraries listed below
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆37Updated this week
- ☆14Updated last month
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆64Updated 2 months ago
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆229Updated this week
- ☆40Updated 2 weeks ago
- AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kub…☆279Updated this week
- Tools to deploy GPU clusters in the Cloud☆30Updated last year
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆275Updated this week
- ☆66Updated 7 months ago
- A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud☆139Updated 3 months ago
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆103Updated this week
- Deep learning benchmark utility and optimization tips on EKS.☆48Updated 5 years ago
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆35Updated this week
- ML Pipeline Generator is a tool for generating end-to-end pipelines composed of GCP components so that any customer can easily migrate th…☆50Updated 3 years ago
- Deploying EFA in EKS utilizing GPUDirectRDMA where supported☆37Updated 4 months ago
- Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos☆82Updated 9 months ago
- Terraform module for creating GKE clusters to run Kubeflow☆215Updated 4 years ago
- ☆134Updated 2 weeks ago
- ☆46Updated 9 months ago
- A direct Google Cloud Storage integration for PyTorch☆35Updated 2 months ago
- This example shows how to produce multimodal videos with audio using the Kinetics dataset on AWS Trainium and EC2 GPU orchestrated by EKS…☆4Updated 6 months ago
- Repository used to main group ACLs used by Kubeflow developers☆16Updated this week
- Infrastructure as code for GPU accelerated managed Kubernetes clusters.☆51Updated 2 months ago
- MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.☆154Updated 5 months ago
- Code labs for Vertex AI☆46Updated 3 years ago
- Cloud Build for Deploying Datapipelines with Composer, Dataflow and BigQuery☆64Updated 4 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆64Updated 9 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆54Updated this week
- Amazon EMR on EKS Custom Image CLI☆27Updated 4 months ago
- Kustomize manifest to deploy kubeflow pipelines in AWS☆21Updated 3 years ago