rodalbuyeh / pyspark-k8s-boilerplateLinks
Boilerplate for PySpark on Cloud Kubernetes
☆33Updated 4 years ago
Alternatives and similar repositories for pyspark-k8s-boilerplate
Users that are interested in pyspark-k8s-boilerplate are comparing it to the libraries listed below
Sorting:
- New Generation Opensource Data Stack Demo☆454Updated 2 years ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆278Updated 2 months ago
- Delta Lake examples☆235Updated last year
- Delta Lake Documentation☆51Updated last year
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆222Updated 3 weeks ago
- Delta Lake helper methods in PySpark☆325Updated last year
- pyspark methods to enhance developer productivity 📣 👯 🎉☆678Updated 9 months ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆503Updated last month
- Resources for video demonstrations and blog posts related to DataOps on AWS☆182Updated 3 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆34Updated 5 years ago
- Pyspark boilerplate for running prod ready data pipeline☆29Updated 4 years ago
- The easiest way to run Airflow locally, with linting & tests for valid DAGs and Plugins.☆257Updated 4 years ago
- Delta-Lake, ETL, Spark, Airflow☆48Updated 3 years ago
- ☆42Updated 4 years ago
- Spark style guide☆271Updated last year
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆225Updated 7 months ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆444Updated 5 months ago
- PySpark data-pipeline testing and CICD☆28Updated 5 years ago
- PySpark test helper methods with beautiful error messages☆740Updated last week
- New generation opensource data stack☆76Updated 3 years ago
- Fast iterative local development and testing of Apache Airflow workflows☆202Updated 4 months ago
- Apache Airflow integration for dbt☆412Updated last year
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 3 years ago
- A Python Library to support running data quality rules while the spark job is running⚡☆193Updated last week
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- ☆269Updated last year
- ☆202Updated 2 years ago
- Python API for Deequ☆806Updated 8 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆77Updated 4 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆125Updated last month