gopetracca / spark-appLinks
Spark Application example with Clean Architecture.
☆10Updated last year
Alternatives and similar repositories for spark-app
Users that are interested in spark-app are comparing it to the libraries listed below
Sorting:
- Template for Data Engineering and Data Pipeline projects☆112Updated 2 years ago
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆114Updated 2 weeks ago
- Python code that will collapse structured columns separating out the attributes into new columns☆11Updated 3 years ago
- Awesome content all about Azure Databricks☆16Updated 3 years ago
- Notebooks to learn Databricks Lakehouse Platform☆28Updated last week
- Spark data pipeline that processes movie ratings data.☆28Updated this week
- PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it☆67Updated last month
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- Sample project to demonstrate data engineering best practices☆191Updated last year
- Example project using DBT, Databricks and AdventureWorks sample database☆12Updated 2 years ago
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- A curated list of awesome Databricks resources, including Spark☆19Updated 11 months ago
- Code snippets for Data Engineering Design Patterns book☆116Updated 2 months ago
- Delta Lake Documentation☆49Updated 11 months ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- An end-to-end ELT pipeline to store simulated heart rate data inside a data warehouse; uses Kafka for real-time processing, Airbyte for d…☆13Updated last year
- Code for "Advanced data transformations in SQL" free live workshop☆81Updated last month
- Delta Lake examples☆225Updated 7 months ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- Data Engineering with Spark and Delta Lake☆99Updated 2 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
- Step by step instructions to create a production-ready data pipeline☆50Updated 5 months ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆66Updated 3 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆117Updated 2 years ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆11Updated 4 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆256Updated last year
- A repository of sample code to accompany our blog post on Airflow and dbt.☆173Updated last year
- Easily create and use Python Virtualenvs in Apache Airflow☆10Updated 6 months ago
- Stream processing with Azure Databricks☆138Updated 5 months ago