uber / petastormLinks
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
☆1,869Updated this week
Alternatives and similar repositories for petastorm
Users that are interested in petastorm are comparing it to the libraries listed below
Sorting:
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,744Updated last year
- A low-latency prediction-serving system☆1,421Updated 4 years ago
- Library for exploring and validating machine learning data☆779Updated 5 months ago
- MLeap: Deploy ML Pipelines to Production☆1,527Updated last year
- TFX is an end-to-end platform for deploying production ML pipelines☆2,168Updated 2 weeks ago
- For recording and retrieving metadata associated with ML developer and data scientist workflows.☆667Updated 8 months ago
- Distributed Computing for AI Made Simple☆1,047Updated 2 years ago
- Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO☆735Updated 3 weeks ago
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,267Updated 10 months ago
- Scalable Machine Learning with Dask☆942Updated 2 months ago
- Automated Machine Learning on Kubernetes☆1,648Updated this week
- NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale da…☆1,135Updated last month
- Universal model exchange and serialization format for decision tree forests☆799Updated last week
- A model-agnostic visual debugging tool for machine learning☆1,671Updated 10 months ago
- Input pipeline framework☆990Updated 4 months ago
- TonY is a framework to natively run deep learning frameworks on Apache Hadoop.☆710Updated 2 years ago
- PyTorch elastic training☆729Updated 3 years ago
- Adaptive Experimentation Platform☆2,667Updated last week
- Experiment tracking, ML developer tools☆891Updated 7 months ago
- Kubeflow’s superfood for Data Scientists☆649Updated last week
- Model analysis tools for TensorFlow☆1,268Updated 4 months ago
- Train and run Pytorch models on Apache Spark.☆341Updated 2 years ago
- BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.☆1,996Updated 3 years ago
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,698Updated this week
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,687Updated last week
- Extended pickling support for Python objects☆1,866Updated last month
- Hummingbird compiles trained ML models into tensor computation for faster inference.☆3,510Updated 5 months ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,538Updated last year
- python implementation of the parquet columnar file format.☆871Updated 2 months ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Updated 3 months ago