GoogleCloudPlatform / nvidia-nemo-on-gkeLinks
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
☆13Updated 4 months ago
Alternatives and similar repositories for nvidia-nemo-on-gke
Users that are interested in nvidia-nemo-on-gke are comparing it to the libraries listed below
Sorting:
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆84Updated this week
- Terraform module for creating GKE clusters to run Kubeflow☆215Updated 4 years ago
- Amazon SageMaker operator for Kubernetes☆149Updated 2 years ago
- Volume Controller for Kubernetes☆67Updated 2 years ago
- Deep learning benchmark utility and optimization tips on EKS.☆48Updated 6 years ago
- AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kub…☆324Updated 2 months ago
- Seldon Core Operator for Kubernetes☆12Updated 5 years ago
- A Terraform module for running Kubeflow on a kubernetes cluster.☆20Updated 3 years ago
- Train and Deploy Machine Learning Models on Kubernetes using Amazon EKS☆168Updated 6 years ago
- MLOps on Amazon EKS☆99Updated this week
- Repository for open inference protocol specification☆59Updated 4 months ago
- Kubernetes custom controller and CRDs to managing Airflow☆299Updated 5 years ago
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆286Updated this week
- Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference example…☆60Updated last month
- Airflow on Kubernetes Operator☆86Updated 2 years ago
- kfctl is a CLI for deploying and managing Kubeflow☆184Updated 2 years ago
- ☆46Updated 7 months ago
- Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos☆81Updated last year
- Deploying EFA in EKS utilizing GPUDirectRDMA where supported☆36Updated 11 months ago
- KubeFlow on AWS☆185Updated 3 weeks ago
- Repository for assets related to Metadata.☆124Updated 3 years ago
- Python SDK for building, training, and deploying ML models☆337Updated 3 years ago
- Argoflow has been superseded by deployKF☆136Updated 2 years ago
- CSI Driver of Amazon FSx for Lustre https://aws.amazon.com/fsx/lustre/☆140Updated last week
- ☆73Updated last year
- Repository used to main group ACLs used by Kubeflow developers☆18Updated this week
- Code name for Batch on GKE.☆50Updated 5 years ago
- [EOL] Anonymous Usage Collector☆74Updated 6 years ago
- Architecture and UX design of KAML-D☆14Updated 7 years ago
- ☆12Updated last year