This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆18Jun 6, 2020Updated 5 years ago
Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline
Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- implementing an end-to-end tweets ETL/Analysis pipeline.☆59Dec 8, 2022Updated 3 years ago
- ZINDI GIZ NLP Agricultural Keyword Spotter 3rd place solution, Audio Classification☆11Sep 8, 2021Updated 4 years ago
- The gaming industry is certainly one of the thriving industries of the modern age and one of those that are most influenced by the advanc…☆12Jun 29, 2020Updated 5 years ago
- ☆15Dec 2, 2020Updated 5 years ago
- This repository contains the 2nd place solution for the GIZ NLP word spotter competition organized by Zindi.☆14Dec 3, 2020Updated 5 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Spark + Python for Maketing Analytics☆10Apr 19, 2017Updated 8 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 7 years ago
- Primer curso de Craftech Academy - Marzo 2021☆11Aug 3, 2021Updated 4 years ago
- A summary of useful resources in order to learn about AI☆10May 9, 2020Updated 5 years ago
- Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extrac…☆10Jul 12, 2021Updated 4 years ago
- This Challenge aims to infer important COVID-19 public health risk factors from outdated data in South Africa☆20Dec 8, 2022Updated 3 years ago
- An open-source repo to product management case studies.☆23Updated this week
- This repository explains how to predict customer churn. An Hackathon Organized by Data Science Nigeria(DSN-AI) to help Expresso predict c…☆21Oct 17, 2021Updated 4 years ago
- Teaching notes from my Advanced SQL workshops as local lead instructor at General Assembly New York. The first edition was created for th…☆18Feb 14, 2020Updated 6 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Jul 7, 2021Updated 4 years ago
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated last year
- Build Your Own Roadmap☆11Jul 8, 2020Updated 5 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- Data mining algorithms with Python☆10Jun 26, 2019Updated 6 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- A Data Visualization project on the French traffic accidents database☆19Aug 27, 2019Updated 6 years ago
- Roadmap to becoming a web developer in 2017 in spanish, Roadmap para ser un desarrollador web en el 2017☆15Jun 16, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- Tweepy Stream Example☆19Apr 23, 2019Updated 6 years ago
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.☆1,497Mar 9, 2020Updated 6 years ago
- This repository contains everything you need to become proficient in System Design and Case Studies with Code Implementation☆18Jan 27, 2024Updated 2 years ago
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago
- Samples of ML models learning from source code☆20Nov 28, 2022Updated 3 years ago
- Leetcode solution for weekly contest☆16Jan 11, 2020Updated 6 years ago
- All Coding project for CS6515 GA☆14Jul 22, 2022Updated 3 years ago
- Apache Spark 2 for Beginners, published by Packt☆33Oct 31, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- data visualizations and R code for #TidyTuesday 2021☆16Feb 4, 2022Updated 4 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- Package for Computational Biology Reading Group☆13Apr 20, 2022Updated 3 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- A complete example of an AWS Glue application that uses the Serverless Framework to deploy the infrastructure and DevContainers and/or Do…☆20Updated this week
- Steven's 100DaysOfCloudRepo☆17Nov 22, 2020Updated 5 years ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆32Aug 14, 2023Updated 2 years ago