GoogleCloudPlatform / nvidia-nemo-on-gke
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
☆12Updated last week
Alternatives and similar repositories for nvidia-nemo-on-gke:
Users that are interested in nvidia-nemo-on-gke are comparing it to the libraries listed below
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆61Updated 2 weeks ago
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆64Updated 5 months ago
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆117Updated last week
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆45Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆322Updated this week
- ☆138Updated last week
- Apache YuniKorn Scheduler Interface☆29Updated this week
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆250Updated this week
- ☆43Updated 3 months ago
- ☆16Updated last month
- Test infrastructure and tooling for Kubeflow.☆62Updated 2 months ago
- Repository used to main group ACLs used by Kubeflow developers☆18Updated this week
- Secure HDFS Access from Kubernetes☆60Updated 4 years ago
- AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kub…☆311Updated this week
- Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos☆82Updated last year
- ☆13Updated 3 weeks ago
- Infrastructure as code for GPU accelerated managed Kubernetes clusters.☆55Updated last week
- Terraform module for creating GKE clusters to run Kubeflow☆215Updated 4 years ago
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- Repository for makeinga a GitHub Actions for deploying to Kubeflow.☆35Updated 3 years ago
- Introduction to Ray Core Design Patterns and APIs.☆68Updated last year
- Seldon Core Operator for Kubernetes☆12Updated 5 years ago
- Ray-based Apache Beam runner☆42Updated last year
- Deploying EFA in EKS utilizing GPUDirectRDMA where supported☆37Updated 6 months ago
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 7 years ago
- Amazon SageMaker operator for Kubernetes☆149Updated last year
- Repository for open inference protocol specification☆54Updated 9 months ago
- A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deploymen…☆179Updated this week
- A direct Google Cloud Storage integration for PyTorch☆37Updated last month
- ☆58Updated last year