judeleonard / Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
☆23Updated last year
Related projects: ⓘ
- ☆11Updated 3 years ago
- ☆27Updated 10 months ago
- ☆35Updated 2 months ago
- ☆84Updated 2 years ago
- Code for "Advanced data transformations in SQL" free live workshop☆54Updated last month
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆91Updated last month
- ☆29Updated last year
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆20Updated 2 years ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆47Updated 3 months ago
- Sample project to demonstrate data engineering best practices☆156Updated 6 months ago
- End to end data engineering project☆49Updated last year
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆127Updated 4 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆11Updated last year
- Recohut - Learn data engineering, data science☆93Updated last year
- ☆59Updated this week
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆24Updated 9 months ago
- Ravi Azure ADB ADF Repository☆64Updated 4 months ago
- Project for "Data pipeline design patterns" blog.☆41Updated last month
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆54Updated last month
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆12Updated 2 years ago
- Data pipeline that scrapes Rust cheater Steam profiles☆50Updated 2 years ago
- PySpark Cheatsheet☆35Updated last year
- ☆16Updated 8 months ago
- ☆35Updated last year
- ☆83Updated this week
- Code for dbt tutorial☆138Updated 3 months ago
- In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…☆21Updated last year
- ☆123Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆47Updated last month
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆39Updated 5 years ago