rodalbuyeh / pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
☆33Updated 2 years ago
Related projects: ⓘ
- Pyspark boilerplate for running prod ready data pipeline☆29Updated 3 years ago
- Delta Lake Documentation☆45Updated 3 months ago
- ☆38Updated this week
- ☆20Updated 3 years ago
- PySpark data-pipeline testing and CICD☆28Updated 3 years ago
- Delta Lake examples☆201Updated 3 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆64Updated 3 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆63Updated 4 months ago
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆167Updated 10 months ago
- Sample Airflow DAGs☆60Updated last year
- A Table format agnostic data sharing framework☆36Updated 7 months ago
- Great Expectations Airflow operator☆158Updated 2 weeks ago
- Rules based grant management for Snowflake☆40Updated 5 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆40Updated 7 months ago
- A write-audit-publish implementation on a data lake without the JVM☆39Updated last month
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆52Updated last year
- A repository of sample code to accompany our blog post on Airflow and dbt.☆167Updated last year
- Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested da…☆110Updated 11 months ago
- Code snippets for Data Engineering Design Patterns book☆27Updated this week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆185Updated this week
- A repository of sample code to show data quality checking best practices using Airflow.☆71Updated last year
- Airflow training for the crunch conf☆105Updated 5 years ago
- Simple stream processing pipeline☆89Updated 3 months ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated last year
- Accelerator to rapidly deploy customized features for your business☆55Updated 9 months ago
- New generation opensource data stack☆60Updated 2 years ago
- Delta-Lake, ETL, Spark, Airflow☆42Updated last year
- Data pipeline with dbt, Airflow, Great Expectations☆155Updated 3 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆96Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year