Create an Amazon EKS cluster and run a distributed training example
☆29Aug 19, 2024Updated last year
Alternatives and similar repositories for aws-distributed-training-workshop-eks
Users that are interested in aws-distributed-training-workshop-eks are comparing it to the libraries listed below
Sorting:
- Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference example…☆64Mar 14, 2026Updated last week
- Openfold inference architecture for Amazon EKS☆11Oct 1, 2024Updated last year
- A do-framework project to simplify deployment of Kubeflow on Amazon EKS☆22Feb 18, 2025Updated last year
- Create and manage Amazon SageMaker HyperPod clusters, run distributed model training☆24Jan 29, 2026Updated last month
- A CLI tool that helps manage training jobs on the SageMaker HyperPod clusters orchestrated by Amazon EKS☆33Updated this week
- ☆12May 30, 2025Updated 9 months ago
- ☆15Mar 15, 2021Updated 5 years ago
- Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.☆403Updated this week
- ☆58Feb 5, 2026Updated last month
- Deploy and scale distributed python applications on Amazon EKS using Ray☆19Updated this week
- Kubernetes cluster deployment Ansible playbook☆34Aug 15, 2018Updated 7 years ago
- A wrapper around SageMaker ML Lineage Tracking extending ML Lineage to end-to-end ML lifecycles, including additional capabilities around…☆16Oct 14, 2021Updated 4 years ago
- ☆13May 8, 2023Updated 2 years ago
- Some Ansible plays & roles to install Rancher and Kubernetes Cluster☆43Feb 16, 2024Updated 2 years ago
- [INACTIVE] A real-time, collaborative, HTML5 drawing widget powered by KineticJS / FabricJS and inspired by Literally Canvas.☆10Feb 9, 2014Updated 12 years ago
- Azure Authentication Plugin for Vault☆17Updated this week
- ☆10Jan 23, 2023Updated 3 years ago
- GPT-jax based on the official huggingface library☆13Jun 22, 2021Updated 4 years ago
- ☆14Oct 31, 2024Updated last year
- Implementation of the SOTA Transformer architecture from PaLM - Scaling Language Modeling with Pathways in JAX/Flax☆14Jun 22, 2022Updated 3 years ago
- This project compares the performance of Swin-Transformer v2 implemented in JAX and PyTorch.☆12Jun 8, 2022Updated 3 years ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆176Updated this week
- ☆13Oct 20, 2018Updated 7 years ago
- Source code of the initial version of the EBS Optimizer tool made available on the AWS Marketplace.☆18Sep 27, 2021Updated 4 years ago
- Whit is an open source SMS service, which allows you to query CrunchBase, Wikipedia, and several other data APIs.☆197May 19, 2013Updated 12 years ago
- Experiment management with Hydra and MLflow☆13Nov 20, 2020Updated 5 years ago
- Serverless application to monitor an AWS Batch architecture through dashboards.☆64Dec 2, 2025Updated 3 months ago
- ☆11Mar 16, 2021Updated 5 years ago
- ☆41Aug 27, 2024Updated last year
- Canidadate for the Kaggle 2017 Data Science Bowl - Automatic detection of lung cancer from CT scans☆10Apr 7, 2017Updated 8 years ago
- This project contains source code and supporting files for a serverless application which can be used for detecting defects in products i…☆16May 23, 2025Updated 9 months ago
- Wardens Assembly Workshops☆11Apr 19, 2023Updated 2 years ago
- Scheduling app to search for free block fo time on your Google Calendar (using Flask).☆16Jun 24, 2013Updated 12 years ago
- ☆19Nov 8, 2023Updated 2 years ago
- Winning solution for the Rakuten Data Challenge, as part of SIGIR eCom '18.☆22Aug 11, 2018Updated 7 years ago
- Web server for S3 compatible storage☆25Apr 23, 2024Updated last year
- ☆16Feb 2, 2024Updated 2 years ago
- ☆13May 16, 2023Updated 2 years ago
- GitHub Action for building an ARM Template from Bicep☆13Jun 18, 2022Updated 3 years ago