aws-neuron / neuronx-distributed
☆55Updated last month
Alternatives and similar repositories for neuronx-distributed:
Users that are interested in neuronx-distributed are comparing it to the libraries listed below
- ☆107Updated 3 months ago
- ☆34Updated last month
- Example code for AWS Neuron SDK developers building inference and training applications☆143Updated last week
- ☆35Updated 4 months ago
- Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.☆229Updated this week
- ☆11Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆15Updated this week
- ☆24Updated last year
- EFA/NCCL base AMI build Packer and CodeBuild/Pipeline files. Also base Docker build files to enable EFA/NCCL in containers☆43Updated last year
- Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and i…☆510Updated 2 weeks ago
- ☆46Updated last week
- ☆14Updated this week
- ☆15Updated last month
- ☆24Updated 3 months ago
- ☆13Updated last month
- Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stac…☆239Updated 3 weeks ago
- Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.☆285Updated this week
- A high performance data access library for machine learning tasks☆74Updated last year
- The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.☆154Updated this week
- Distributed preprocessing and data loading for language datasets☆39Updated last year
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆169Updated last week
- ☆251Updated 9 months ago
- Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference example…☆58Updated last week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆151Updated last week
- ☆95Updated this week
- ☆58Updated 2 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆194Updated this week
- A schedule language for large model training☆146Updated 10 months ago
- ☆17Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆157Updated 5 months ago