ddgope / Data-Pipelines-with-Airflow
This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data quality as the final step. Automate the ETL pipeline and creation of data warehouse using Apache Airflow. Skills include: Using Airflow to β¦
β82Updated 5 years ago
Alternatives and similar repositories for Data-Pipelines-with-Airflow:
Users that are interested in Data-Pipelines-with-Airflow are comparing it to the libraries listed below
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflowβ142Updated 4 years ago
- πComplete End to End ETL Pipeline with Spark, Airflow, & AWSβ45Updated 5 years ago
- β126Updated last month
- End to end data engineering projectβ53Updated 2 years ago
- Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.β139Updated 4 years ago
- Udacity Data Engineering Nanodegree Capstone Projectβ36Updated 4 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modelingβ100Updated 4 years ago
- Sample project to demonstrate data engineering best practicesβ184Updated last year
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMRβ82Updated 5 years ago
- Template for Data Engineering and Data Pipeline projectsβ109Updated 2 years ago
- Simple ETL pipeline using Pythonβ25Updated last year
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,β¦β56Updated 2 years ago
- Simple stream processing pipelineβ99Updated 9 months ago
- Code for dbt tutorialβ155Updated 10 months ago
- β87Updated 2 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for β¦β136Updated 4 years ago
- RedditR for Content Engagement and Recommendationβ21Updated 7 years ago
- This repo contains commands that data engineers use in day to day work.β60Updated 2 years ago
- Projects done in the Data Engineer Nanodegree Program by Udacity.comβ149Updated 2 years ago
- Classwork projects and home works done through Udacity data engineering nano degreeβ74Updated last year
- My solutions for the Udacity Data Engineering Nanodegreeβ33Updated 5 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as β¦β16Updated 5 years ago
- This repo will guide you step-by-step method to create star schema dimensional model.β25Updated 3 years ago
- β11Updated 4 years ago
- β19Updated last year
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testingβ260Updated 8 months ago
- β180Updated 4 years ago
- Project for "Data pipeline design patterns" blog.β45Updated 7 months ago
- β60Updated 3 years ago
- Near real time ETL to populate a dashboard.β73Updated 9 months ago