marcelonyc / igz_sparkk8s
MLOps NYC 2019 training session: Runnign Spark on Kubernetes
☆18Updated 3 years ago
Alternatives and similar repositories for igz_sparkk8s:
Users that are interested in igz_sparkk8s are comparing it to the libraries listed below
- spark on kubernetes☆105Updated 2 years ago
- Run EMR workloads on EKS☆13Updated 3 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆86Updated last year
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆64Updated 9 months ago
- Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.☆52Updated 2 years ago
- This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.☆53Updated last year
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- Interactive Notebooks that support the book☆39Updated 4 years ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Cassandra + Spark = ❤️ Machine Learning with Apache Spark & Cassandra☆20Updated 3 years ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- KubeFlow on AWS☆178Updated last month
- Examples for using Amazon SageMaker components in Kubeflow Pipelines☆22Updated 4 years ago
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆211Updated last year
- Big Data Demystified meetup and blog examples☆31Updated 6 months ago
- ☆20Updated 5 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆40Updated 2 years ago
- This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I r…☆19Updated 3 years ago
- Automated Machine Learning on AWS, published by Packt☆45Updated last year
- Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.☆38Updated 2 years ago
- Full stack data engineering tools and infrastructure set-up☆49Updated 4 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated last month
- Delta Lake Documentation☆48Updated 8 months ago
- mlflow_on_kubeflow☆14Updated 3 years ago
- Materials for the next course☆24Updated 2 years ago
- real-time data + ML pipeline☆54Updated 3 weeks ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆24Updated 6 months ago
- ☆17Updated 2 years ago
- ☆36Updated 2 years ago
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago