AdeboyeML / UK_Accident_Traffic_ETL_Pipeline
This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆18Updated 4 years ago
Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline:
Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆82Updated 5 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆56Updated 2 years ago
- ☆87Updated 2 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆44Updated 5 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆16Updated 5 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆74Updated last year
- Udacity Data Engineering Nanodegree Capstone Project☆36Updated 4 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆141Updated 4 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆134Updated 4 years ago
- This repo contains commands that data engineers use in day to day work.☆60Updated 2 years ago
- Simple ETL pipeline using Python☆25Updated last year
- My Udacity Data Engineer Nano Degree Projects aka Udacity DEND☆16Updated 5 years ago
- Ravi Azure ADB ADF Repository☆65Updated 2 months ago
- Projects done in the Data Engineering Nanodegree by Udacity.com☆271Updated 5 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Updated 3 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆100Updated 4 years ago
- Udacity Data Engineering Nano Degree (DEND)☆184Updated 5 years ago
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆97Updated 8 months ago
- My solutions for the Udacity Data Engineering Nanodegree☆33Updated 5 years ago
- Data Engineering Capstone☆16Updated 5 years ago
- Hey this is the repo that has all the queries and data for my video game training series!☆142Updated 2 years ago
- This repo will guide you step-by-step method to create star schema dimensional model.☆25Updated 3 years ago
- A way for home buyers to know about factors affecting a state☆47Updated 6 years ago
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆12Updated 2 years ago
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆27Updated 4 years ago
- ☆9Updated 4 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆314Updated 3 years ago
- This repo contains "Databricks Certified Data Engineer Associate" Questions and related docs.☆130Updated 7 months ago
- This repo is mostly created for pyspark and hive related interview questions.☆47Updated 3 years ago
- PySpark Cheatsheet☆36Updated 2 years ago