awslabs / s3-connector-for-pytorch
The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.
☆123Updated this week
Related projects ⓘ
Alternatives and complementary repositories for s3-connector-for-pytorch
- ☆168Updated last year
- ☆31Updated this week
- EFA/NCCL base AMI build Packer and CodeBuild/Pipeline files. Also base Docker build files to enable EFA/NCCL in containers☆41Updated last year
- Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference example…☆50Updated 3 weeks ago
- ☆100Updated 2 months ago
- This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It …☆39Updated 2 weeks ago
- ☆64Updated 4 months ago
- A helper library to connect into Amazon SageMaker with AWS Systems Manager and SSH (Secure Shell)☆223Updated this week
- ☆22Updated 7 months ago
- Example code for AWS Neuron SDK developers building inference and training applications☆129Updated last month
- A high performance data access library for machine learning tasks☆74Updated last year
- Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and i…☆464Updated this week
- ☆17Updated this week
- Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.☆203Updated this week
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆148Updated this week
- KubeFlow on AWS☆173Updated last month
- Module, Model, and Tensor Serialization/Deserialization☆189Updated last month
- AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads☆203Updated last year
- Deploying EFA in EKS utilizing GPUDirectRDMA where supported☆37Updated last month
- ACK service controller for Amazon SageMaker☆41Updated last month
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆78Updated this week
- CUDA checkpoint and restore utility☆227Updated 7 months ago
- Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.☆210Updated this week
- ☆46Updated 3 weeks ago
- Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at h…☆134Updated last month
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆331Updated last week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆58Updated this week
- MIG Partition Editor for NVIDIA GPUs☆174Updated this week
- The Triton backend for the PyTorch TorchScript models.☆127Updated this week
- Container plugin for Slurm Workload Manager☆295Updated 2 weeks ago