NVIDIA / ais-etl
Provides for deploying custom ETL containers on AIStore, with subsequent user-defined extraction-transformation-loading in parallel, on the fly and/or offline, locally to user data.
☆16Updated 2 weeks ago
Alternatives and similar repositories for ais-etl:
Users that are interested in ais-etl are comparing it to the libraries listed below
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆89Updated this week
- Ray-based Apache Beam runner☆43Updated last year
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆198Updated this week
- RAPIDS GPU-BDB☆108Updated last year
- This repository contains example integrations between Determined and other ML products☆48Updated 11 months ago
- MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.☆154Updated 6 months ago
- Exoshuffle-CloudSort☆24Updated 2 years ago
- ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.☆45Updated last week
- An fsspec implementation for the lakeFS project☆46Updated last week
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆107Updated this week
- KvikIO - High Performance File IO☆195Updated this week
- Rayvens makes it possible for data scientists to access hundreds of data services within Ray with little effort.☆50Updated 2 years ago
- Prepare requirements and deploy Flyte using Helm☆65Updated 2 weeks ago
- Unified storage framework for the entire machine learning lifecycle☆155Updated last year
- AI Data Management & Evaluation Platform☆215Updated last year
- Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.☆521Updated last year
- The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.☆147Updated this week
- Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.☆266Updated this week
- Utilities for Dask and CUDA interactions☆301Updated last week
- A set of IaC artifacts to automatically configure the infrastructure resources needed by a Flyte deployment☆26Updated 3 weeks ago
- ☆30Updated 2 years ago
- Morpheus Runtime Core (MRC)☆49Updated this week
- ☆37Updated this week
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆354Updated this week
- Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...☆139Updated last month
- cuVS - a library for vector search and clustering on the GPU☆331Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆220Updated last month
- Synchronicity lets you interoperate with asynchronous Python APIs.☆106Updated last month
- ☆37Updated this week