anthonywong611 / Batch-ETL-with-AWS-EMR-and-MWAA
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.
☆9Updated 3 years ago
Alternatives and similar repositories for Batch-ETL-with-AWS-EMR-and-MWAA:
Users that are interested in Batch-ETL-with-AWS-EMR-and-MWAA are comparing it to the libraries listed below
- Sample project to demonstrate data engineering best practices☆179Updated last year
- ☆149Updated 2 years ago
- ☆32Updated last year
- ☆135Updated 2 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆43Updated 5 years ago
- Projects done in the Data Engineer Nanodegree Program by Udacity.com☆135Updated 2 years ago
- Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and tr…☆10Updated last year
- ☆28Updated last year
- Udacity Data Engineering Nanodegree Capstone Project☆35Updated 4 years ago
- Ultimate guide for mastering Spark Performance Tuning and Optimization concepts and for preparing for Data Engineering interviews☆109Updated 9 months ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆101Updated 4 years ago
- Code for "Advanced data transformations in SQL" free live workshop