Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
☆1,879Jan 2, 2026Updated last month
Alternatives and similar repositories for petastorm
Users that are interested in petastorm are comparing it to the libraries listed below
Sorting:
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,675Dec 1, 2025Updated 3 months ago
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,697Feb 21, 2026Updated last week
- The Open Source Feature Store for AI/ML☆6,737Updated this week
- Low-code framework for building custom LLMs, neural networks, and other AI models☆11,651Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆41,516Updated this week
- The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, …☆24,485Updated this week
- High performance model preprocessing library on PyTorch☆646Mar 29, 2024Updated last year
- 🦉 Data Versioning and ML Experiments☆15,404Updated this week
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,730Feb 16, 2026Updated last week
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,475Feb 5, 2026Updated 3 weeks ago
- MLeap: Deploy ML Pipelines to Production☆1,535Jan 12, 2026Updated last month
- cuDF - GPU DataFrame Library☆9,498Updated this week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,362Feb 10, 2026Updated 2 weeks ago
- Simple and Distributed Machine Learning☆5,200Feb 14, 2026Updated 2 weeks ago
- Build, Manage and Deploy AI/ML Systems☆9,863Updated this week
- Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.☆6,754Feb 21, 2026Updated last week
- Parallel computing with task scheduling☆13,746Feb 22, 2026Updated last week
- Data-Centric Pipelines and Data Versioning☆6,286Feb 3, 2025Updated last year
- PyTorch extensions for high performance and large scale training.☆3,400Apr 26, 2025Updated 10 months ago
- PyTorch elastic training☆728Jun 15, 2022Updated 3 years ago
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,744Jul 23, 2024Updated last year
- Distributed Computing for AI Made Simple☆1,047Mar 19, 2023Updated 2 years ago
- 📚 Parameterize, execute, and analyze notebooks☆6,388Jan 5, 2026Updated last month
- A system for quickly generating training data with weak supervision☆5,939May 2, 2024Updated last year
- Deep universal probabilistic programming with Python and PyTorch☆8,983Jul 9, 2025Updated 7 months ago
- Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics☆16,543Updated this week
- A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.☆2,997Feb 9, 2026Updated 2 weeks ago
- A uniform interface to run deep learning models from multiple frameworks☆940Jan 3, 2024Updated 2 years ago
- An open source python library for automated feature engineering☆7,614Feb 3, 2026Updated 3 weeks ago
- Production infrastructure for machine learning at scale☆8,031Jun 12, 2024Updated last year
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,768Updated this week
- Machine Learning Toolkit for Kubernetes☆15,462Jan 5, 2026Updated last month
- ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling …☆6,548Updated this week
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,285Feb 10, 2025Updated last year
- Hydra is a framework for elegantly configuring complex applications☆10,231Feb 7, 2026Updated 3 weeks ago
- TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows…☆2,272Sep 29, 2023Updated 2 years ago
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,858Jul 10, 2023Updated 2 years ago
- Hummingbird compiles trained ML models into tensor computation for faster inference.☆3,529Jul 17, 2025Updated 7 months ago
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆8,602Feb 21, 2026Updated last week