josephmachado / spark_submit_airflow
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
☆31Updated 3 years ago
Related projects: ⓘ
- Simple stream processing pipeline☆89Updated 3 months ago
- ☆25Updated last year
- Docker with Airflow and Spark standalone cluster☆239Updated last year
- ☆100Updated last month
- Code for dbt tutorial☆138Updated 3 months ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆167Updated last year
- End to end data engineering project☆49Updated last year
- A dbt adapter for Databricks.☆211Updated this week
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆167Updated 10 months ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆225Updated 2 months ago
- Step-by-step tutorial on building a Kimball dimensional model with dbt☆100Updated 2 months ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Sample project to demonstrate data engineering best practices☆156Updated 6 months ago
- Data pipeline with dbt, Airflow, Great Expectations☆155Updated 3 years ago
- Delta Lake examples☆201Updated 3 months ago
- Execution of DBT models using Apache Airflow through Docker Compose☆111Updated last year
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆39Updated 5 years ago
- This repository helps teach people how to correctly define and create cumulative tables!☆209Updated last month
- A repository of sample code to show data quality checking best practices using Airflow.☆71Updated last year
- Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.☆138Updated 4 years ago
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆71Updated 9 months ago
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆170Updated 2 months ago
- ☆14Updated 5 years ago
- Template for Data Engineering and Data Pipeline projects☆101Updated last year
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆12Updated 2 years ago
- Delta-Lake, ETL, Spark, Airflow☆42Updated last year
- Near real time ETL to populate a dashboard.☆69Updated 3 months ago
- how to unit test your PySpark code☆27Updated 3 years ago
- This project is for demonstrating knowledge of Data Engineering tools and concepts and also learning in the process☆44Updated last year
- Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase☆121Updated 2 years ago