marcelonyc / igz_sparkk8s
MLOps NYC 2019 training session: Runnign Spark on Kubernetes
☆18Updated 3 years ago
Alternatives and similar repositories for igz_sparkk8s:
Users that are interested in igz_sparkk8s are comparing it to the libraries listed below
- spark on kubernetes☆105Updated 2 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆25Updated 5 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated 2 years ago
- Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.☆53Updated 2 years ago
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago
- Cassandra + Spark = ❤️ Machine Learning with Apache Spark & Cassandra☆20Updated 3 years ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.☆38Updated 2 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- Interactive Notebooks that support the book☆40Updated 4 years ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- Supplementary material for Building a Modern Data Platform with Snowflake, from Pearson.☆21Updated 3 years ago
- Data Engineering with Spark and Delta Lake☆98Updated 2 years ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Updated this week
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Updated 3 years ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆85Updated last year
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- Deploy your Spark Production Cluster on Kubernetes☆47Updated 4 years ago
- Machine Learning on Kubernetes, published by packt☆74Updated last year
- ☆75Updated 3 months ago
- AWS Big Data Certification☆25Updated 3 months ago
- Examples for using Amazon SageMaker components in Kubeflow Pipelines☆22Updated 4 years ago
- Sample Airflow DAGs☆62Updated 2 years ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆26Updated 8 months ago
- Demo for GitHub Universe 2022☆12Updated 2 years ago
- Performance optimization for Spark running on Kubernetes☆87Updated 4 years ago
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and…☆28Updated 2 years ago
- Apache Beam examples for running on Google Cloud Dataflow.☆30Updated 6 years ago