😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS
☆51Aug 23, 2019Updated 6 years ago
Alternatives and similar repositories for DataEngineeringCapstoneProject
Users that are interested in DataEngineeringCapstoneProject are comparing it to the libraries listed below
Sorting:
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆90Jul 17, 2019Updated 6 years ago
- Udacity Data Pipeline Exercises☆15Jun 6, 2020Updated 5 years ago
- Road to Azure Data Engineer Part-II: DP-201 - Designing an Azure Data Solution☆19Aug 16, 2020Updated 5 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- Udacity Data Engineering Nanodegree Capstone Project☆37May 9, 2020Updated 5 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆163Jun 16, 2020Updated 5 years ago
- Apache Kafka Overview☆12Jun 9, 2023Updated 2 years ago
- Delta Lake Examples☆11Apr 24, 2020Updated 5 years ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆14Oct 26, 2021Updated 4 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Jan 22, 2024Updated 2 years ago
- Udacity Data Engineering Nano Degree (DEND)☆189Jan 20, 2020Updated 6 years ago
- This project focuses on building a robust data pipeline using Apache Airflow to automate the ingestion of weather data from the OpenWeath…☆22Feb 3, 2026Updated 3 weeks ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Jan 22, 2019Updated 7 years ago
- An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…☆25Dec 7, 2022Updated 3 years ago
- ☆22Apr 13, 2023Updated 2 years ago
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 2 years ago
- Extract, Transform, Load (ETL) refers to a process in database usage and especially in data warehousing. This repository contains a s…☆21Mar 20, 2017Updated 8 years ago
- Example end to end data engineering project.☆1,387Dec 8, 2022Updated 3 years ago
- Apache Spark Interview Question and Answers☆21Oct 13, 2020Updated 5 years ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Aug 11, 2023Updated 2 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- Udacity Data Engineer Nano Degree - Project-3 (Data Warehouse)☆22Jun 20, 2019Updated 6 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- ☆38May 22, 2024Updated last year
- Super Mario is a legendary game we all cherish! In this project, we will deploy Super Mario on Amazon EKS (Elastic Kubernetes Service) us…☆11Feb 3, 2026Updated 3 weeks ago
- ☆18Aug 15, 2022Updated 3 years ago
- Spark, Airflow, Kafka☆24Apr 30, 2023Updated 2 years ago
- Tour of Scala - Scala classes☆32May 26, 2025Updated 9 months ago
- Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…☆1,826Aug 26, 2022Updated 3 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆141Apr 18, 2020Updated 5 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- ☆16May 26, 2025Updated 9 months ago
- 14天完成数据分析实战项目☆10Sep 7, 2022Updated 3 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- CODO is an ontology for the semantic representation and annotation of COVID-19 data in a machine-readable form for tracking history of th…☆10Apr 19, 2022Updated 3 years ago
- ☆12Updated this week