palantir / pyspark-style-guide
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
β1,137Updated 7 months ago
Alternatives and similar repositories for pyspark-style-guide:
Users that are interested in pyspark-style-guide are comparing it to the libraries listed below
- PySpark test helper methods with beautiful error messagesβ686Updated 3 weeks ago
- Python API for Deequβ766Updated last month
- pyspark methods to enhance developer productivity π£ π― πβ670Updated 2 months ago
- Implementing best practices for PySpark ETL jobs and applications.β1,904Updated 2 years ago
- Spark style guideβ258Updated 7 months ago
- Delta Lake helper methods in PySparkβ322Updated 8 months ago
- A Data Engineering & Machine Learning Knowledge Hubβ1,127Updated last year
- A curated list of awesome dbt resourcesβ1,434Updated 2 weeks ago
- Code for Data Pipelines with Apache Airflowβ766Updated 8 months ago
- Port(ish) of Great Expectations to dbt test macrosβ1,162Updated 4 months ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps fasterβ454Updated 6 months ago
- Accumulated knowledge and experience in the field of Data Engineeringβ868Updated 2 years ago
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.ioβ2,079Updated this week
- Beginner data engineering project - batch editionβ516Updated 3 months ago
- Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp Udemy courseβ565Updated last week
- Code for "Efficient Data Processing in Spark" Courseβ296Updated 7 months ago
- Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of codeβ919Updated this week
- Practical Data Engineering: A Hands-On Real-Estate Project Guideβ647Updated 8 months ago
- Utility functions for dbt projects.β1,516Updated last month
- Pyspark RDD, DataFrame and Dataset Examples in Python languageβ1,250Updated last year
- π Quick reference guide to common patterns & functions in PySpark.β532Updated 2 years ago
- A self-contained dbt project for testing purposesβ493Updated 7 months ago
- β776Updated 2 weeks ago
- A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.β672Updated 3 years ago
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricksβ427Updated 2 months ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,077Updated last month
- Guides and docs to help you get up and running with Apache Airflow.β808Updated 2 years ago
- The best place to learn data engineering. Built and maintained by the data engineering community.β1,660Updated 3 weeks ago
- dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)β1,061Updated last week
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.β1,373Updated 5 years ago