AdeboyeML / UK_Accident_Traffic_ETL_Pipeline
This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆17Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for UK_Accident_Traffic_ETL_Pipeline
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆56Updated 2 years ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆80Updated 5 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆133Updated 4 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆16Updated 5 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆99Updated 3 years ago
- PySpark Cheatsheet☆35Updated last year
- Classwork projects and home works done through Udacity data engineering nano degree☆74Updated 11 months ago
- ☆86Updated 2 years ago
- Udacity Data Engineering Nanodegree Program☆51Updated 3 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆43Updated 5 years ago
- Projects done in the Data Engineer Nanodegree Program by Udacity.com☆94Updated last year
- Hey this is the repo that has all the queries and data for my video game training series!☆133Updated 2 years ago
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆92Updated 3 months ago
- Ravi Azure ADB ADF Repository☆64Updated 6 months ago
- This repo contains commands that data engineers use in day to day work.☆59Updated last year
- Udacity Data Engineering Nanodegree Capstone Project☆35Updated 4 years ago
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆74Updated 5 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆132Updated 4 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Updated 3 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆297Updated 2 years ago
- Projects done in the Data Engineering Nanodegree by Udacity.com☆269Updated 5 years ago
- This repo contains "Databricks Certified Data Engineer Associate" Questions and related docs.☆88Updated 3 months ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Updated 3 years ago
- A way for home buyers to know about factors affecting a state☆47Updated 5 years ago
- This repo is mostly created for pyspark and hive related interview questions.☆46Updated 2 years ago
- Contains spark dataframe solutions of leetcode questions☆24Updated last year
- ☆31Updated 6 years ago
- Udacity Data Engineering Nano Degree (DEND)☆184Updated 4 years ago
- Udacity's 5 Month Data Engineering Nanodegree program. This repo includes all the projects completed.☆27Updated 4 years ago