Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
☆1,889Jan 2, 2026Updated 5 months ago
Alternatives and similar repositories for petastorm
Users that are interested in petastorm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,692Jun 20, 2026Updated last week
- AI Infra / AI Orchestration / AI Control Plane☆3,709Jun 23, 2026Updated last week
- Low-code framework for building custom LLMs, neural networks, and other AI models☆11,728Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆43,025Updated this week
- The Open Source Feature Store for AI/ML☆7,109Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- High performance model preprocessing library on PyTorch☆642Mar 29, 2024Updated 2 years ago
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, a…☆26,741Updated this week
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,755Mar 23, 2026Updated 3 months ago
- MLeap: Deploy ML Pipelines to Production☆1,539Mar 10, 2026Updated 3 months ago
- 🦉 Data Versioning and ML Experiments☆15,708Jun 24, 2026Updated last week
- Simple and Distributed Machine Learning☆5,231Updated this week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,389Feb 10, 2026Updated 4 months ago
- Build, Manage and Deploy AI/ML Systems☆10,143Jun 24, 2026Updated last week
- Distributed Computing for AI Made Simple☆1,046Mar 19, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- PyTorch elastic training☆728Jun 15, 2022Updated 4 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,507Apr 1, 2026Updated 3 months ago
- PyTorch extensions for high performance and large scale training.☆3,409Apr 26, 2025Updated last year
- Parallel computing with task scheduling☆13,856Updated this week
- A uniform interface to run deep learning models from multiple frameworks☆943Jan 3, 2024Updated 2 years ago
- Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics☆16,878Updated this week
- cuDF - GPU DataFrame Library☆9,692Updated this week
- A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.☆3,125Feb 9, 2026Updated 4 months ago
- Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.☆7,119Updated this week
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Data-Centric Pipelines and Data Versioning☆6,296Feb 3, 2025Updated last year
- Machine Learning Toolkit for Kubernetes☆15,750Jun 18, 2026Updated last week
- Deep universal probabilistic programming with Python and PyTorch☆9,015Jun 5, 2026Updated 3 weeks ago
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,848Jul 10, 2023Updated 2 years ago
- A system for quickly generating training data with weak supervision☆5,982Jun 8, 2026Updated 3 weeks ago
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,747Jul 23, 2024Updated last year
- 📚 Parameterize, execute, and analyze notebooks☆6,456May 12, 2026Updated last month
- Production infrastructure for machine learning at scale☆8,010Jun 12, 2024Updated 2 years ago
- Read and write Tensorflow TFRecord data from Apache Spark.☆300Apr 22, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An open source python library for automated feature engineering☆7,659Jun 17, 2026Updated 2 weeks ago
- Hydra is a framework for elegantly configuring complex applications☆10,473Updated this week
- AIStore: scalable storage for AI applications☆1,884Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,903Updated this week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.☆31,209Jun 10, 2026Updated 3 weeks ago
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more☆35,930Updated this week
- A low-latency prediction-serving system☆1,422Apr 26, 2021Updated 5 years ago