This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and traffic datasets from Amazon S3, clean and transform with Pyspark, transfer it back to S3 and finally load to Amazon Redshift (Distributed Database), from where the data can be queried for ad-hoc analyses.
☆18Jun 6, 2020Updated 5 years ago
Alternatives and similar repositories for UK_Accident_Traffic_ETL_Pipeline
Users that are interested in UK_Accident_Traffic_ETL_Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- implementing an end-to-end tweets ETL/Analysis pipeline.☆59Dec 8, 2022Updated 3 years ago
- ETL using Python in Jupyter Notebook, loading CSV, cleaning data, and saving to SQL Database.☆14Nov 17, 2020Updated 5 years ago
- Spark + Python for Maketing Analytics☆10Apr 19, 2017Updated 9 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 7 years ago
- Primer curso de Craftech Academy - Marzo 2021☆11Aug 3, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A summary of useful resources in order to learn about AI☆10May 9, 2020Updated 6 years ago
- Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extrac…☆10Jul 12, 2021Updated 4 years ago
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆26Feb 9, 2021Updated 5 years ago
- Teaching notes from my Advanced SQL workshops as local lead instructor at General Assembly New York. The first edition was created for th…☆18Feb 14, 2020Updated 6 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- CICD pipeline that deploys a dbt image on a GKE cluster☆11Jul 7, 2021Updated 4 years ago
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated last year
- Build Your Own Roadmap☆11Jul 8, 2020Updated 5 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- HTML5 widgets for WTForms☆25Apr 27, 2026Updated last week
- Roadmap to becoming a web developer in 2017 in spanish, Roadmap para ser un desarrollador web en el 2017☆15Jun 16, 2017Updated 8 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- Tweepy Stream Example☆19Apr 23, 2019Updated 7 years ago
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.☆1,508Mar 9, 2020Updated 6 years ago
- This repository contains everything you need to become proficient in System Design and Case Studies with Code Implementation☆18Jan 27, 2024Updated 2 years ago
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago
- All Coding project for CS6515 GA☆14Jul 22, 2022Updated 3 years ago
- data visualizations and R code for #TidyTuesday 2021☆16Feb 4, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Package for Computational Biology Reading Group☆13Apr 20, 2022Updated 4 years ago
- ☆16Mar 5, 2025Updated last year
- A complete example of an AWS Glue application that uses the Serverless Framework to deploy the infrastructure and DevContainers and/or Do…☆20May 1, 2026Updated last week
- Enriching Your Python Classes With Dunder (Magic, Special) Methods☆20Jun 26, 2017Updated 8 years ago
- ☆29Jan 23, 2019Updated 7 years ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- These are the Jupyter notebooks for the Big Data specialization in the Data Science Program.☆15Apr 3, 2020Updated 6 years ago
- SageMaker specific extensions to TensorFlow.☆54Jul 23, 2024Updated last year
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Jul 6, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code from https://pythonwise.blogspot.com☆21Nov 23, 2023Updated 2 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆351Jan 12, 2022Updated 4 years ago
- My Git Repo for Csv Data☆21Oct 5, 2025Updated 7 months ago
- This is where we put useful code for our daily job with data.☆28Mar 19, 2025Updated last year
- A simple spark standalone cluster for your testing environment purposses☆23Jul 25, 2020Updated 5 years ago
- SQL data model for working with Snowplow web data. Supports Redshift and Looker. Snowflake and BigQuery coming soon☆60Dec 1, 2020Updated 5 years ago
- This repository contains all the python projects done as a tutorial☆27Mar 27, 2025Updated last year