judeleonard/Prescriber-ETL-data-pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/judeleonard/Prescriber-ETL-data-pipeline)

judeleonard / Prescriber-ETL-data-pipeline

An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports

☆27

Alternatives and similar repositories for Prescriber-ETL-data-pipeline

Users that are interested in Prescriber-ETL-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ismaildawoodjee / aws-data-pipeline
View on GitHub
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…
☆24May 14, 2022Updated 4 years ago
dogukannulu / glue_etl_job_data_catalog_s3
View on GitHub
Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog
☆13Aug 26, 2023Updated 2 years ago
markwsutton / ETL-using-Python-SQL
View on GitHub
ETL using Python in Jupyter Notebook, loading CSV, cleaning data, and saving to SQL Database.
☆14Nov 17, 2020Updated 5 years ago
siddharth271101 / Covid-19-and-Aviation-Industry
View on GitHub
The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…
☆13Jun 26, 2022Updated 4 years ago
DSKunth / ETL-Pipeline
View on GitHub
This repository contains tasks on how to build an ETL pipeline for the online transaction data of an e-commerce company.
☆19Jun 27, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
datamarts / prostore
View on GitHub
Project is in active development and has been moved to https://repository.datamart.ru/datamarts/prostore.
☆17Apr 22, 2022Updated 4 years ago
farrellwahyudi / Predicting-Ad-Clicks-Classification-by-Using-Machine-Learning
View on GitHub
In this project I used ML modeling and data analysis to predict ad clicks and significantly improve ad campaign performance, resulting in…
☆13Nov 6, 2023Updated 2 years ago
kamilmuratyilmaz / earthquake-location-from-image
View on GitHub
☆12Feb 6, 2023Updated 3 years ago
JTupitza-UVA / DS-2002
View on GitHub
☆43Apr 30, 2026Updated 2 months ago
AnkitKumarSingh11 / Data-Structures-And-Algorithms
View on GitHub
This project contains my solution for all the data structures and algorithms on Algo Expert, Hackerrank and Leetcode. This repository is …
☆10Jan 24, 2021Updated 5 years ago
bhavyabhagerathi / Invoice-Data-Extraction-Bot-using-LLAMA-2-and-Streamlit
View on GitHub
Automates the tedious task of extracting crucial information from invoices with the Invoice Data Extraction Bot.
☆12Feb 7, 2024Updated 2 years ago
supratim94336 / DataEngineeringCapstoneProject
View on GitHub
😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS
☆51Aug 23, 2019Updated 6 years ago
Stefen-Taime / ETL-Data-Pipeline-RDBMS-TO-HDFS-using-Airflow-Apache-Sqoop-Spark-Postgres-and-Hive
View on GitHub
This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)
☆11Apr 29, 2022Updated 4 years ago
mikahanninen / robots
View on GitHub
☆14Nov 11, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
sumitjoshi21 / NSE-Real-Time-Stocks-Analysis-and-Predictions-Using-Python-LSTM-Model
View on GitHub
In this project first we fetch data of any stock(NSE) in realtime then we evaluate the stock price using basics visualizations then we…
☆12Mar 24, 2023Updated 3 years ago
gispada / nestjs-python-kafka-microservices
View on GitHub
Project to experiment with a microservices architecture based on Apache Kafka
☆23Jul 8, 2023Updated 3 years ago
Krishnamohan-Yerrabilli / Deployment-on-K8s-cluster-using-jenkins-CI-CD
View on GitHub
In this project, we will be deploying a Kubernetes cluster using a Jenkins CI/CD pipeline. We will be utilizing various DevOps tools such…
☆13Jun 6, 2023Updated 3 years ago
shenxiangzhuang / toydata
View on GitHub
Data Structures in Python
☆10Jul 13, 2026Updated 2 weeks ago
PontusHultkrantz / price-discovery
View on GitHub
Mid price estimation in LOB using Markov model
☆13May 11, 2022Updated 4 years ago
skoonData / docker-compose
View on GitHub
☆12Jul 27, 2021Updated 5 years ago
rajeshmore1 / DataScience_Mentorship
View on GitHub
Course Material - Data Science Program
☆15Mar 31, 2026Updated 3 months ago
AhmetFurkanDEMIR / dataengineering-youtube-project
View on GitHub
Data Engineering Youtube Project
☆12Jun 29, 2023Updated 3 years ago
enessoztrk / WhatsApp_Chat_Analysis_Heroku_Deployment
View on GitHub
☆15Nov 10, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mfarragher / curated-data-science-resources
View on GitHub
People ask me about data science resources so I've curated some here: this is <<20% of the size of an 'awesome' list but has 80% of the v…
☆11Jan 14, 2023Updated 3 years ago
halitkalayci / selenium-advanced
View on GitHub
☆12Dec 29, 2022Updated 3 years ago
damklis / etljob
View on GitHub
Simple ETL pipeline using Python
☆29May 22, 2023Updated 3 years ago
ongxuanhong / de02-pyspark-optimization
View on GitHub
☆14Mar 11, 2023Updated 3 years ago
raizen-analytics / data-engineering-test
View on GitHub
☆24Dec 4, 2023Updated 2 years ago
sbalnojan / FDE-airflow-tutorial
View on GitHub
Functional Data Engineering tutorial in Python & Airflow.
☆17Mar 24, 2023Updated 3 years ago
PetroIvaniuk / awesome-ml
View on GitHub
List of interesting links about ML Algorithms, Data Science, Network Analysis, and others.
☆13May 9, 2023Updated 3 years ago
sivamsinghsh / AI-Chatbot-With-ChatGPT-API
View on GitHub
In this tutorial, we have added step-by-step instructions to build your own AI chatbot with ChatGPT API. From setting up tools to install…
☆10Apr 13, 2023Updated 3 years ago
PastorGL / datacooker-etl
View on GitHub
ETL processing toolset with SQL-like language and GIS capabilities, built on core Spark. Extensible and modular. REPL included
☆16Jun 12, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AhmetFurkanDEMIR / Flink-Example
View on GitHub
Flink Example
☆17Nov 19, 2023Updated 2 years ago
Alaboy19 / model-retraining-gitops-fastapi
View on GitHub
the full pipeline for model retraining with fastapi and github actions
☆16Jul 5, 2024Updated 2 years ago
ChaitanyaC22 / Deep-RL-Project---Maximize-total-profits-earned-by-cab-driver
View on GitHub
The goal of this project is to build an RL-based algorithm that can help cab drivers maximize their profits by improving their decision-m…
☆13Jul 9, 2021Updated 5 years ago
sidharth178 / Youtube-Adview-Prediction
View on GitHub
A Machine Learning project for Machine Learning Internship offered by InternshipStudio.
☆12Aug 8, 2021Updated 4 years ago
chandra1sekar / data-engineering
View on GitHub
☆32Aug 13, 2018Updated 7 years ago
shahriar-rahman / Exploratory-Analysis-of-Netflix-Userbase
View on GitHub
Data wrangling and Feature analysis based on the Netflix userbase sample Dataset.
☆14Dec 13, 2023Updated 2 years ago
HuJK / Code-Server-Hub
View on GitHub
Jupyterhub like web page for code-server.
☆25May 27, 2026Updated 2 months ago