AdeboyeML / UK_Accident_Traffic_ETL_Pipeline
This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆18Updated 4 years ago
Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline
Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below
Sorting:
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆83Updated 5 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆56Updated 2 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆136Updated 5 years ago
- ☆87Updated 2 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆319Updated 3 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆74Updated last year
- Udacity Data Engineering Nanodegree Capstone Project☆36Updated 5 years ago
- Projects done in the Data Engineering Nanodegree by Udacity.com☆273Updated 5 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆45Updated 5 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆16Updated 5 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆144Updated 4 years ago
- Udacity Data Engineering Nano Degree (DEND)☆185Updated 5 years ago
- Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.☆140Updated 4 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Updated 3 years ago
- Mastering Big Data Analytics with PySpark, Published by Packt☆160Updated 8 months ago
- Ravi Azure ADB ADF Repository☆66Updated 3 months ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆86Updated 5 years ago
- Fundamentals of Spark with Python (using PySpark), code examples☆345Updated 2 years ago
- PySpark Cheatsheet☆36Updated 2 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆103Updated 4 years ago
- All Data Engineering notebooks from Datacamp course☆115Updated 5 years ago
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆98Updated 9 months ago
- Hey this is the repo that has all the queries and data for my video game training series!☆142Updated 2 years ago
- Data Engineering Capstone☆17Updated 5 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- ☆53Updated 4 years ago
- RedditR for Content Engagement and Recommendation☆22Updated 7 years ago
- This repo is mostly created for pyspark and hive related interview questions.☆47Updated 3 years ago
- Udacity's 5 Month Data Engineering Nanodegree program. This repo includes all the projects completed.☆27Updated 4 years ago