palantir / pyspark-style-guideLinks

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.

☆1,167

Alternatives and similar repositories for pyspark-style-guide

Users that are interested in pyspark-style-guide are comparing it to the libraries listed below

Sorting:

MrPowers / chispa
PySpark test helper methods with beautiful error messages
☆709Updated last week
awslabs / python-deequ
Python API for Deequ
☆788Updated 4 months ago
BasPH / data-pipelines-with-apache-airflow
Code for Data Pipelines with Apache Airflow
☆785Updated 11 months ago
AlexIoannides / pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
☆1,967Updated 2 years ago
mrpowers-io / quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
☆676Updated 5 months ago
sodadata / soda-core
Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
☆2,147Updated this week
calogica / dbt-expectations
Port(ish) of Great Expectations to dbt test macros
☆1,189Updated 7 months ago
josephmachado / beginner_de_project
Beginner data engineering project - batch edition
☆531Updated 6 months ago
cordon-thiago / airflow-spark
Docker with Airflow and Spark standalone cluster
☆261Updated 2 years ago
ilya-galperin / SF-EvictionTracker
Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.
☆139Updated 5 years ago
davidzajac1 / zillacode
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
☆199Updated last month
soggycactus / airflow-repo-template
The easiest way to run Airflow locally, with linting & tests for valid DAGs and Plugins.
☆257Updated 4 years ago
san089 / goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
☆1,403Updated 5 years ago
cartershanklin / pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
☆476Updated 9 months ago
MrPowers / mack
Delta Lake helper methods in PySpark
☆325Updated 11 months ago
Hiflylabs / awesome-dbt
A curated list of awesome dbt resources
☆1,503Updated 3 months ago
kevinschaich / pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
☆582Updated 2 years ago
cluster-apps-on-docker / spark-standalone-cluster-on-docker
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.
☆496Updated 2 years ago
EcZachly / microbatch-hourly-deduped-tutorial
☆117Updated 2 weeks ago
spbail / dag-stack
Data pipeline with dbt, Airflow, Great Expectations
☆163Updated 4 years ago
adidas / lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…
☆257Updated last week
astronomer / airflow-guides
Guides and docs to help you get up and running with Apache Airflow.
☆808Updated last week
josephmachado / efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
☆326Updated 2 months ago
EcZachly / little-book-of-pipelines
This repository goes over how to handle massive variety in data engineering
☆287Updated 2 years ago
ankurchavda / streamify
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
☆728Updated 3 years ago
astronomer / dag-factory
Construct Apache Airflow DAGs Declaratively via YAML configuration files
☆1,332Updated this week
renatootescu / ETL-pipeline
Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.
☆330Updated 3 years ago
josephmachado / data_engineering_project_template
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
☆271Updated last year
sdg-1 / data-team-handbook
☆830Updated 3 months ago
EcZachly / video-game-training-sql
Hey this is the repo that has all the queries and data for my video game training series!
☆149Updated 3 years ago