This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆18Jun 6, 2020Updated 5 years ago
Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline
Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below
Sorting:
- ☆12Sep 19, 2021Updated 4 years ago
- A summary of useful resources in order to learn about AI☆10May 9, 2020Updated 5 years ago
- Spark + Python for Maketing Analytics☆10Apr 19, 2017Updated 8 years ago
- All Coding project for CS6515 GA☆14Jul 22, 2022Updated 3 years ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Jul 7, 2021Updated 4 years ago
- ZINDI GIZ NLP Agricultural Keyword Spotter 3rd place solution, Audio Classification☆11Sep 8, 2021Updated 4 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 7 years ago
- Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extrac…☆10Jul 12, 2021Updated 4 years ago
- ETL using Python in Jupyter Notebook, loading CSV, cleaning data, and saving to SQL Database.☆14Nov 17, 2020Updated 5 years ago
- Package for Computational Biology Reading Group☆13Apr 20, 2022Updated 3 years ago
- Solution to 6th place of audio classification competition (https://zindi.africa/competitions/giz-nlp-agricultural-keyword-spotter/leaderb…☆12Dec 1, 2020Updated 5 years ago
- ☆14Aug 9, 2016Updated 9 years ago
- Teaching notes from my Advanced SQL workshops as local lead instructor at General Assembly New York. The first edition was created for th…☆18Feb 14, 2020Updated 6 years ago
- This repository contains the 2nd place solution for the GIZ NLP word spotter competition organized by Zindi.☆14Dec 3, 2020Updated 5 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- data visualizations and R code for #TidyTuesday 2021☆16Feb 4, 2022Updated 4 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- This Challenge aims to infer important COVID-19 public health risk factors from outdated data in South Africa☆20Dec 8, 2022Updated 3 years ago
- A complete example of an AWS Glue application that uses the Serverless Framework to deploy the infrastructure and DevContainers and/or Do…☆20Jul 22, 2025Updated 7 months ago
- Enriching Your Python Classes With Dunder (Magic, Special) Methods☆20Jun 26, 2017Updated 8 years ago
- My Data Engineer Capstone project. A consolidated dataset with several jobs around the world.☆13May 22, 2023Updated 2 years ago
- A GUI program to visualize sorting algorithms☆27Apr 2, 2023Updated 2 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- My Git Repo for Csv Data☆21Oct 5, 2025Updated 5 months ago
- Code from https://pythonwise.blogspot.com☆21Nov 23, 2023Updated 2 years ago
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆26Feb 9, 2021Updated 5 years ago
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.☆1,495Mar 9, 2020Updated 6 years ago
- Tweepy Stream Example☆19Apr 23, 2019Updated 6 years ago
- A simple spark standalone cluster for your testing environment purposses☆23Jul 25, 2020Updated 5 years ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆25Aug 11, 2023Updated 2 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- Spark data pipeline that processes movie ratings data.☆31Mar 1, 2026Updated last week
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆32Aug 14, 2023Updated 2 years ago
- Code and Plots for #TidyTuesday☆37Aug 31, 2022Updated 3 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆347Jan 12, 2022Updated 4 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Jul 6, 2022Updated 3 years ago
- Ravi Azure ADB ADF Repository☆64Jan 25, 2025Updated last year