uber/petastorm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/uber/petastorm)

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

☆1,888

Alternatives and similar repositories for petastorm

Users that are interested in petastorm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

horovod / horovod
View on GitHub
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
☆14,692Jun 20, 2026Updated last month
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,371Mar 20, 2024Updated 2 years ago
polyaxon / polyaxon
View on GitHub
AI Infra / AI Orchestration / AI Control Plane
☆3,715Jul 15, 2026Updated last week
ludwig-ai / ludwig
View on GitHub
Low-code framework for building custom LLMs, neural networks, and other AI models
☆11,743Updated this week
ray-project / ray
View on GitHub
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
☆43,305Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
feast-dev / feast
View on GitHub
The Open Source Feature Store for AI/ML
☆7,142Updated this week
pytorch / torcharrow
View on GitHub
High performance model preprocessing library on PyTorch
☆641Mar 29, 2024Updated 2 years ago
mlflow / mlflow
View on GitHub
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, a…
☆27,140Updated this week
SeldonIO / seldon-core
View on GitHub
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
☆4,763Mar 23, 2026Updated 3 months ago
treeverse / dvc
View on GitHub
🦉 Data Versioning and ML Experiments
☆15,768Updated this week
microsoft / SynapseML
View on GitHub
Simple and Distributed Machine Learning
☆5,232Jul 6, 2026Updated 2 weeks ago
combust / mleap
View on GitHub
MLeap: Deploy ML Pipelines to Production
☆1,539Jul 10, 2026Updated last week
Netflix / metaflow
View on GitHub
Build, Manage and Deploy AI/ML Systems
☆10,190Updated this week
uber / fiber
View on GitHub
Distributed Computing for AI Made Simple
☆1,047Mar 19, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
modin-project / modin
View on GitHub
Modin: Scale your Pandas workflows by changing a single line of code
☆10,395Feb 10, 2026Updated 5 months ago
facebookresearch / fairscale
View on GitHub
PyTorch extensions for high performance and large scale training.
☆3,411Apr 26, 2025Updated last year
pytorch / elastic
View on GitHub
PyTorch elastic training
☆727Jun 15, 2022Updated 4 years ago
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,509Apr 1, 2026Updated 3 months ago
dask / dask
View on GitHub
Parallel computing with task scheduling
☆13,865Updated this week
uber / neuropod
View on GitHub
A uniform interface to run deep learning models from multiple frameworks
☆943Jan 3, 2024Updated 2 years ago
apache / arrow
View on GitHub
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆16,944Updated this week
webdataset / webdataset
View on GitHub
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
☆3,147Feb 9, 2026Updated 5 months ago
rapidsai / cudf
View on GitHub
cuDF - GPU DataFrame Library
☆9,709Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
flyteorg / flyte
View on GitHub
Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.
☆7,147Updated this week
kubeflow / kubeflow
View on GitHub
Machine Learning Toolkit for Kubernetes
☆15,789Jul 10, 2026Updated last week
pachyderm / pachyderm
View on GitHub
Data-Centric Pipelines and Data Versioning
☆6,297Feb 3, 2025Updated last year
pyro-ppl / pyro
View on GitHub
Deep universal probabilistic programming with Python and PyTorch
☆9,025Jul 10, 2026Updated last week
yahoo / TensorFlowOnSpark
View on GitHub
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
☆3,846Jul 10, 2023Updated 3 years ago
snorkel-team / snorkel
View on GitHub
A system for quickly generating training data with weak supervision
☆5,992Jun 8, 2026Updated last month
VertaAI / modeldb
View on GitHub
Open Source ML Model Versioning, Metadata, and Experiment Management
☆1,745Jul 23, 2024Updated last year
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,459Jul 6, 2026Updated 2 weeks ago
databricks / spark-deep-learning
View on GitHub
Deep Learning Pipelines for Apache Spark
☆1,989Mar 30, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
cortexlabs / cortex
View on GitHub
Production infrastructure for machine learning at scale
☆8,012Jun 12, 2024Updated 2 years ago
linkedin / spark-tfrecord
View on GitHub
Read and write Tensorflow TFRecord data from Apache Spark.
☆300Apr 22, 2024Updated 2 years ago
alteryx / featuretools
View on GitHub
An open source python library for automated feature engineering
☆7,665Updated this week
NVIDIA / aistore
View on GitHub
AIStore: scalable storage for AI applications
☆1,896Updated this week
ucbrise / clipper
View on GitHub
A low-latency prediction-serving system
☆1,421Apr 26, 2021Updated 5 years ago
jax-ml / jax
View on GitHub
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
☆36,028Updated this week
Lightning-AI / pytorch-lightning
View on GitHub
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
☆31,243Updated this week