rodalbuyeh / pyspark-k8s-boilerplateLinks
Boilerplate for PySpark on Cloud Kubernetes
☆33Updated 3 years ago
Alternatives and similar repositories for pyspark-k8s-boilerplate
Users that are interested in pyspark-k8s-boilerplate are comparing it to the libraries listed below
Sorting:
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 4 years ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- Spark data pipeline that processes movie ratings data.☆28Updated this week
- ☆21Updated 4 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- Delta Lake Documentation☆49Updated 11 months ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- ☆21Updated 2 months ago
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆173Updated last year
- Fast iterative local development and testing of Apache Airflow workflows☆201Updated last month
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆148Updated this week
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆27Updated last year
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆35Updated last year
- Execution of DBT models using Apache Airflow through Docker Compose☆117Updated 2 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Pylint plugin for static code analysis on Airflow code☆95Updated 4 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆68Updated last year
- Make simple storing test results and visualisation of these in a BI dashboard☆44Updated 2 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆70Updated 8 months ago
- Delta Lake examples☆225Updated 7 months ago
- This repo helps bootstrap the infrastructures with a modern data stack on Google Cloud Platform using Terraform.☆116Updated 3 years ago
- Code snippets for Data Engineering Design Patterns book☆116Updated 2 months ago
- A bunch of hacks developed around dbt☆48Updated 5 years ago
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago