uber / petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
☆1,828Updated last year
Alternatives and similar repositories for petastorm:
Users that are interested in petastorm are comparing it to the libraries listed below
- TFX is an end-to-end platform for deploying production ML pipelines☆2,142Updated 2 weeks ago
- Automated Machine Learning on Kubernetes☆1,563Updated this week
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,722Updated 8 months ago
- A low-latency prediction-serving system☆1,414Updated 3 years ago
- Distributed Computing for AI Made Simple☆1,045Updated 2 years ago
- PyTorch elastic training☆730Updated 2 years ago
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,628Updated last month
- For recording and retrieving metadata associated with ML developer and data scientist workflows.☆642Updated last week
- Library for exploring and validating machine learning data☆769Updated last month
- MLeap: Deploy ML Pipelines to Production☆1,516Updated 4 months ago
- NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale da…☆1,078Updated 7 months ago
- High performance model preprocessing library on PyTorch☆650Updated last year
- Scalable Machine Learning with Dask☆927Updated 2 months ago
- A uniform interface to run deep learning models from multiple frameworks☆936Updated last year
- A model-agnostic visual debugging tool for machine learning☆1,657Updated 2 months ago
- Extended pickling support for Python objects☆1,738Updated 2 weeks ago
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,219Updated 2 months ago
- TonY is a framework to natively run deep learning frameworks on Apache Hadoop.☆706Updated last year
- Model analysis tools for TensorFlow☆1,262Updated last week
- Multi Model Server is a tool for serving neural net models for inference☆1,008Updated 10 months ago
- BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.☆1,958Updated 2 years ago
- A system for quickly generating training data with weak supervision☆5,843Updated 11 months ago
- Input pipeline framework☆987Updated 3 weeks ago
- Train and run Pytorch models on Apache Spark.☆339Updated last year
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,498Updated this week
- Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.☆4,295Updated 4 months ago
- Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO☆723Updated this week
- The Open Source Feature Store for AI/ML☆5,934Updated this week
- Adaptive Experimentation Platform☆2,467Updated this week
- Kubeflow’s superfood for Data Scientists☆632Updated 2 years ago