AdeboyeML/UK_Accident_Traffic_ETL_Pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AdeboyeML/UK_Accident_Traffic_ETL_Pipeline)

AdeboyeML / UK_Accident_Traffic_ETL_Pipeline

This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.

☆18

Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline

Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

markwsutton / ETL-using-Python-SQL
View on GitHub
ETL using Python in Jupyter Notebook, loading CSV, cleaning data, and saving to SQL Database.
☆14Nov 17, 2020Updated 5 years ago
pfjob09 / MarketingAnalytics
View on GitHub
Spark + Python for Maketing Analytics
☆10Apr 19, 2017Updated 9 years ago
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
boom-deva / Teaching_Advanced_SQL
View on GitHub
Teaching notes from my Advanced SQL workshops as local lead instructor at General Assembly New York. The first edition was created for th…
☆19Feb 14, 2020Updated 6 years ago
ronantakizawa / kanomaly
View on GitHub
Time Series Anomaly Detection using a Kolmogorov-Arnold Network
☆27May 21, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Mazen72 / Twitter_Sentiment_Analysis_Dashboard
View on GitHub
☆13Jun 23, 2022Updated 4 years ago
velascoluis / dbt-ci-cd-gke
View on GitHub
CICD pipeline that deploys a dbt image on a GKE cluster
☆11Jul 7, 2021Updated 5 years ago
shravan-kuchkula / dataEngineering
View on GitHub
A repo to track data engineering projects
☆14Nov 11, 2022Updated 3 years ago
chuqiaoshen / Git-Influencer
View on GitHub
Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…
☆16May 21, 2024Updated 2 years ago
ankit-rathi / Self-Starter-Handbook
View on GitHub
Build Your Own Roadmap
☆11Jul 8, 2020Updated 6 years ago
PiotrTa / Mining-Massive-Datasets
View on GitHub
Data mining algorithms with Python
☆10Jun 26, 2019Updated 7 years ago
coveooss / shopper-intent-prediction-nature-2020
View on GitHub
🏟
☆28Nov 11, 2020Updated 5 years ago
dtsdwarak / cs_prep
View on GitHub
Guide to CS Engineering and Interview Prep
☆18Dec 26, 2024Updated last year
san089 / goodreads_etl_pipeline
View on GitHub
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
☆1,535Mar 9, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jonathanhayes / Tweepy-Twitter-Stream-Example
View on GitHub
Tweepy Stream Example
☆19Apr 23, 2019Updated 7 years ago
alanchn31 / Loan-Default-Prediction
View on GitHub
Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy
☆22Dec 26, 2020Updated 5 years ago
baldFemale / LeetCode-Solution
View on GitHub
Leetcode solution for weekly contest
☆16Jan 11, 2020Updated 6 years ago
peterroelants / notebooks
View on GitHub
Collection of notebooks
☆17Oct 27, 2024Updated last year
rapid7 / le_lambda
View on GitHub
☆16Mar 5, 2025Updated last year
PacktPublishing / Apache-Spark-2-for-Beginners
View on GitHub
Apache Spark 2 for Beginners, published by Packt
☆33Oct 31, 2022Updated 3 years ago
friedue / course_RNA-seq2015
View on GitHub
☆14Aug 9, 2016Updated 9 years ago
arunchaganty / Small-World-RL
View on GitHub
Exploring the use of options in creating small worlds for faster learning in RL Domains
☆16Jan 23, 2012Updated 14 years ago
tayganr / lakehouse
View on GitHub
https://aka.ms/lakehouselab
☆23Feb 14, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
dukecct / CBRG
View on GitHub
Package for Computational Biology Reading Group
☆14Apr 20, 2022Updated 4 years ago
shravan-kuchkula / udacity-data-eng-proj2
View on GitHub
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…
☆24Nov 22, 2021Updated 4 years ago
shravan-kuchkula / udacity-data-eng-proj3
View on GitHub
Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.
☆29Aug 14, 2023Updated 2 years ago
jamesbyars / apache-spark-etl-pipeline-example
View on GitHub
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…
☆24Aug 11, 2023Updated 2 years ago
aaronbatchelder / product-management-case-studies
View on GitHub
An open-source repo to product management case studies.
☆29Updated this week
san089 / Optimizing-Public-Transportation
View on GitHub
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
☆33Aug 14, 2023Updated 2 years ago
AbstractFuture / 100DaysOfCloud
View on GitHub
Steven's 100DaysOfCloudRepo
☆17Nov 22, 2020Updated 5 years ago
PacktPublishing / Mastering-Spark-for-Data-Science
View on GitHub
Mastering Spark for Data Science, published by Packt
☆51Apr 22, 2026Updated 3 months ago
pybites / dunders
View on GitHub
Enriching Your Python Classes With Dunder (Magic, Special) Methods
☆20Jun 26, 2017Updated 9 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
guidok91 / spark-movies-etl
View on GitHub
Spark data pipeline that processes movie ratings data.
☆31Updated this week
PacktPublishing / Apache-Spark-3-for-Data-Engineering-and-Analytics-with-Python-
View on GitHub
Apache Spark 3 for Data Engineering and Analytics with Python , By Packt publishing
☆24Jul 23, 2023Updated 3 years ago
Thinkful-Ed / big-data-student-resources
View on GitHub
These are the Jupyter notebooks for the Big Data specialization in the Data Science Program.
☆15Apr 3, 2020Updated 6 years ago
gabfr / work-around-the-world
View on GitHub
My Data Engineer Capstone project. A consolidated dataset with several jobs around the world.
☆13May 22, 2023Updated 3 years ago
tebeka / pythonwise
View on GitHub
Code from https://pythonwise.blogspot.com
☆21Nov 23, 2023Updated 2 years ago
garystafford / aws-airflow-demo
View on GitHub
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…
☆41Jul 6, 2022Updated 4 years ago
mohammadst99 / SelfDraiving_FindingLane
View on GitHub
in this prj we will find and detect the lane and we are able to find the area that the car should be in
☆24Mar 28, 2022Updated 4 years ago