NVIDIA / multi-storage-clientLinks
Unified high-performance Python client for object and file stores.
☆28Updated last week
Alternatives and similar repositories for multi-storage-client
Users that are interested in multi-storage-client are comparing it to the libraries listed below
Sorting:
- Container plugin for Slurm Workload Manager☆349Updated 7 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆179Updated 3 weeks ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆351Updated 2 weeks ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆349Updated this week
- AIStore: scalable storage for AI applications☆1,535Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆422Updated last week
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆368Updated last week
- Tools to deploy GPU clusters in the Cloud☆31Updated 2 years ago
- KvikIO - High Performance File IO☆213Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆241Updated 2 weeks ago
- The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.☆166Updated this week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆512Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆97Updated this week
- ☆141Updated 3 weeks ago
- Dragon distributed runtime for HPC and AI applications and workflows☆72Updated this week
- MIG Partition Editor for NVIDIA GPUs☆202Updated this week
- ☆62Updated 4 months ago
- ☆310Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated this week
- This repository contains the results and code for the MLPerf™ Training v3.1 benchmark.☆17Updated 5 months ago
- A tool to configure, launch and manage your machine learning experiments.☆162Updated this week
- ☆222Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆377Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆188Updated this week
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆176Updated this week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆62Updated 3 months ago
- A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deploymen…☆184Updated 3 weeks ago
- ☆50Updated 3 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆204Updated this week
- PyTorch Single Controller☆231Updated this week