rodalbuyeh / pyspark-k8s-boilerplateLinks
Boilerplate for PySpark on Cloud Kubernetes
☆33Updated 4 years ago
Alternatives and similar repositories for pyspark-k8s-boilerplate
Users that are interested in pyspark-k8s-boilerplate are comparing it to the libraries listed below
Sorting:
- PySpark data-pipeline testing and CICD☆28Updated 5 years ago
- Spark style guide☆264Updated last year
- Delta Lake examples☆230Updated last year
- A curated list of dagster code snippets for data engineers☆56Updated last year
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆495Updated 2 years ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆220Updated 3 weeks ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆96Updated last month
- Pyspark boilerplate for running prod ready data pipeline☆29Updated 4 years ago
- Spark app to merge different schemas☆23Updated 4 years ago
- New Generation Opensource Data Stack Demo☆449Updated 2 years ago
- Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.☆52Updated 3 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- Airflow training for the crunch conf☆104Updated 7 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆181Updated 3 years ago
- A Table format agnostic data sharing framework☆41Updated last year
- A Python Library to support running data quality rules while the spark job is running⚡☆190Updated this week
- A curated list of awesome blogs, videos, tools and resources about Data Contracts☆180Updated last year
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆169Updated 2 years ago
- Great Expectations Airflow operator☆167Updated last week
- New generation opensource data stack☆74Updated 3 years ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆74Updated last week
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Updated 3 years ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆442Updated 3 months ago
- Delta Lake helper methods in PySpark☆323Updated last year
- Delta Lake Documentation☆50Updated last year
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆120Updated 4 years ago
- Fast iterative local development and testing of Apache Airflow workflows☆201Updated 2 months ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆268Updated 3 weeks ago
- ☆202Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆261Updated 2 years ago