uber / petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
☆1,822Updated last year
Alternatives and similar repositories for petastorm:
Users that are interested in petastorm are comparing it to the libraries listed below
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,719Updated 7 months ago
- MLeap: Deploy ML Pipelines to Production☆1,515Updated 3 months ago
- Automated Machine Learning on Kubernetes☆1,550Updated this week
- A low-latency prediction-serving system☆1,413Updated 3 years ago
- Library for exploring and validating machine learning data☆768Updated last week
- Distributed Computing for AI Made Simple☆1,044Updated 2 years ago
- TFX is an end-to-end platform for deploying production ML pipelines☆2,136Updated last week
- Scalable Machine Learning with Dask☆922Updated last month
- TonY is a framework to natively run deep learning frameworks on Apache Hadoop.☆707Updated last year
- For recording and retrieving metadata associated with ML developer and data scientist workflows.☆640Updated 4 months ago
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,213Updated last month
- Universal model exchange and serialization format for decision tree forests☆757Updated last week
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,613Updated 2 weeks ago
- Hummingbird compiles trained ML models into tensor computation for faster inference.☆3,401Updated 2 months ago
- Adaptive Experimentation Platform☆2,447Updated this week
- A system for quickly generating training data with weak supervision☆5,840Updated 10 months ago
- Serve, optimize and scale PyTorch models in production☆4,301Updated this week
- PyTorch elastic training☆730Updated 2 years ago
- High performance model preprocessing library on PyTorch☆650Updated 11 months ago
- TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows…☆2,254Updated last year
- cuML - RAPIDS Machine Learning Library☆4,464Updated this week
- Model analysis tools for TensorFlow☆1,262Updated last month
- PyTorch extensions for high performance and large scale training.☆3,278Updated 2 months ago
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,488Updated this week
- Source code/webpage/demos for the What-If Tool☆942Updated 6 months ago
- BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.☆1,954Updated 2 years ago
- A model-agnostic visual debugging tool for machine learning☆1,657Updated last month
- Collective communications library with various primitives for multi-machine training.☆1,277Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,055Updated 6 months ago
- NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale da…☆1,071Updated 6 months ago