This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆18Jun 6, 2020Updated 6 years ago
Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline
Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- implementing an end-to-end tweets ETL/Analysis pipeline.☆59Dec 8, 2022Updated 3 years ago
- ETL using Python in Jupyter Notebook, loading CSV, cleaning data, and saving to SQL Database.☆14Nov 17, 2020Updated 5 years ago
- A summary of useful resources in order to learn about AI☆10May 9, 2020Updated 6 years ago
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆26Feb 9, 2021Updated 5 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated 2 years ago
- Time Series Anomaly Detection using a Kolmogorov-Arnold Network☆27May 21, 2025Updated last year
- An open-source repo to product management case studies.☆26Jun 11, 2026Updated last week
- Build Your Own Roadmap☆11Jul 8, 2020Updated 5 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- Data mining algorithms with Python☆10Jun 26, 2019Updated 6 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- 🏟☆28Nov 11, 2020Updated 5 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Self-study plan to achieve mastery in data science☆31Mar 29, 2023Updated 3 years ago
- Tweepy Stream Example☆19Apr 23, 2019Updated 7 years ago
- Guide to CS Engineering and Interview Prep☆18Dec 26, 2024Updated last year
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.☆1,513Mar 9, 2020Updated 6 years ago
- This repository contains everything you need to become proficient in System Design and Case Studies with Code Implementation☆18Jan 27, 2024Updated 2 years ago
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago
- Leetcode solution for weekly contest☆16Jan 11, 2020Updated 6 years ago
- Samples of ML models learning from source code☆20Nov 28, 2022Updated 3 years ago
- Collection of notebooks☆17Oct 27, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- Package for Computational Biology Reading Group☆14Apr 20, 2022Updated 4 years ago
- ☆16Mar 5, 2025Updated last year
- All Coding project for CS6515 GA☆15Jul 22, 2022Updated 3 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- A complete example of an AWS Glue application that uses the Serverless Framework to deploy the infrastructure and DevContainers and/or Do…☆20Jun 8, 2026Updated last week
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Aug 11, 2023Updated 2 years ago
- ☆14Aug 9, 2016Updated 9 years ago
- Steven's 100DaysOfCloudRepo☆17Nov 22, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆33Aug 14, 2023Updated 2 years ago
- Enriching Your Python Classes With Dunder (Magic, Special) Methods☆20Jun 26, 2017Updated 8 years ago
- Mastering Spark for Data Science, published by Packt☆50Apr 22, 2026Updated last month
- Spark data pipeline that processes movie ratings data.☆31May 1, 2026Updated last month
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- These are the Jupyter notebooks for the Big Data specialization in the Data Science Program.☆15Apr 3, 2020Updated 6 years ago
- ☆30Jan 23, 2019Updated 7 years ago