cometta / python-apache-beam-sparkLinks
Example on how to deploy Apache beam, Spark Cluster on Kubernetes and run Python code
☆19Updated 4 years ago
Alternatives and similar repositories for python-apache-beam-spark
Users that are interested in python-apache-beam-spark are comparing it to the libraries listed below
Sorting:
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆103Updated 3 years ago
- The Internals of Spark on Kubernetes☆72Updated 3 years ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆420Updated this week
- ☆110Updated last year
- Spark on Kubernetes using Helm☆33Updated 5 years ago
- Repository for Beam College sessions☆112Updated 4 years ago
- Don't Panic. This guide will help you when it feels like the end of the world.☆30Updated 5 months ago
- The Internals of Delta Lake☆187Updated 2 months ago
- Oozie Workflow to Airflow DAGs migration tool☆90Updated 3 weeks ago
- Resource for the book Trino: The Definitive Guide (and formerly Presto: The Definitive Guide)☆231Updated 3 years ago
- ☆269Updated last year
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆175Updated 8 months ago
- Snowflake Data Source for Apache Spark.☆230Updated 3 weeks ago
- ☆65Updated last year
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆37Updated 7 years ago
- Cloud Dataproc: Samples and Utils☆206Updated 3 weeks ago
- A curated list of Apache Flink learning resources☆122Updated last year
- Official Dockerfile for Apache Spark☆165Updated this week
- Spark style guide☆272Updated last year
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆76Updated last year
- Multiple node presto cluster on docker container☆126Updated 3 years ago
- 📚 Tech blogs & talks by companies that run Apache Flink in production☆188Updated last month
- A curated list of awesome resources for Apache Beam☆145Updated 3 years ago
- The Internals of PySpark☆27Updated last year
- Apache Flink Demo Projects☆44Updated last week
- Delta Lake examples☆238Updated last year
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆227Updated 2 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆76Updated 4 years ago
- Helm charts for Trino and Trino Gateway☆191Updated last week
- Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.☆147Updated last year