Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
☆1,880Jan 2, 2026Updated 2 months ago
Alternatives and similar repositories for petastorm
Users that are interested in petastorm are comparing it to the libraries listed below
Sorting:
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,679Dec 1, 2025Updated 3 months ago
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,697Mar 9, 2026Updated last week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆41,799Updated this week
- Low-code framework for building custom LLMs, neural networks, and other AI models☆11,657Updated this week
- The Open Source Feature Store for AI/ML☆6,808Updated this week
- High performance model preprocessing library on PyTorch☆648Mar 29, 2024Updated last year
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, a…☆24,874Updated this week
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,733Mar 9, 2026Updated last week
- 🦉 Data Versioning and ML Experiments☆15,458Updated this week
- MLeap: Deploy ML Pipelines to Production☆1,535Mar 10, 2026Updated last week
- Simple and Distributed Machine Learning☆5,213Mar 12, 2026Updated last week
- Build, Manage and Deploy AI/ML Systems☆9,956Updated this week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,363Feb 10, 2026Updated last month
- Distributed Computing for AI Made Simple☆1,047Mar 19, 2023Updated 3 years ago
- PyTorch elastic training☆729Jun 15, 2022Updated 3 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,492Mar 1, 2026Updated 3 weeks ago
- PyTorch extensions for high performance and large scale training.☆3,403Apr 26, 2025Updated 10 months ago
- Parallel computing with task scheduling☆13,765Mar 12, 2026Updated last week
- A uniform interface to run deep learning models from multiple frameworks☆940Jan 3, 2024Updated 2 years ago
- Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics☆16,597Updated this week
- cuDF - GPU DataFrame Library☆9,558Updated this week
- A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.☆3,022Feb 9, 2026Updated last month
- Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows. Flyte 2 now available locally: https…☆6,885Mar 15, 2026Updated last week
- Data-Centric Pipelines and Data Versioning☆6,290Feb 3, 2025Updated last year
- Machine Learning Toolkit for Kubernetes☆15,519Jan 5, 2026Updated 2 months ago
- Deep universal probabilistic programming with Python and PyTorch☆8,989Jul 9, 2025Updated 8 months ago
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,858Jul 10, 2023Updated 2 years ago
- A system for quickly generating training data with weak supervision☆5,937May 2, 2024Updated last year
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,745Jul 23, 2024Updated last year
- 📚 Parameterize, execute, and analyze notebooks☆6,400Mar 8, 2026Updated last week
- Production infrastructure for machine learning at scale☆8,028Jun 12, 2024Updated last year
- Read and write Tensorflow TFRecord data from Apache Spark.☆298Apr 22, 2024Updated last year
- An open source python library for automated feature engineering☆7,623Feb 3, 2026Updated last month
- Hydra is a framework for elegantly configuring complex applications☆10,264Feb 7, 2026Updated last month
- AIStore: scalable storage for AI applications☆1,779Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,790Mar 13, 2026Updated last week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.☆30,926Mar 10, 2026Updated last week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more☆35,108Updated this week
- A low-latency prediction-serving system☆1,421Apr 26, 2021Updated 4 years ago