supratim94336/DataEngineeringCapstoneProject

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/supratim94336/DataEngineeringCapstoneProject)

supratim94336 / DataEngineeringCapstoneProject

😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS

☆51

Alternatives and similar repositories for DataEngineeringCapstoneProject

Users that are interested in DataEngineeringCapstoneProject are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CICIFLY / Data_Engineering_Project_Portfolio
View on GitHub
Data Engineering, Data Warehouse, Data Mart, Cloud Data, AWS, SAS, Redshift, S3
☆33Feb 2, 2021Updated 5 years ago
fpcarneiro / data-engineer-project
View on GitHub
Data Engineering Capstone
☆17Oct 10, 2019Updated 6 years ago
siddharth271101 / Covid-19-and-Aviation-Industry
View on GitHub
The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…
☆13Jun 26, 2022Updated 4 years ago
judeleonard / Prescriber-ETL-data-pipeline
View on GitHub
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…
☆27Dec 7, 2022Updated 3 years ago
santoshjoshi / Apache-Kafka
View on GitHub
Apache Kafka Overview
☆12Jun 9, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
jamesbyars / apache-spark-etl-pipeline-example
View on GitHub
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…
☆24Aug 11, 2023Updated 2 years ago
damklis / DataEngineeringProject
View on GitHub
Example end to end data engineering project.
☆1,418Dec 8, 2022Updated 3 years ago
dylanzenner / business_closures_de_pipeline
View on GitHub
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
☆14Oct 26, 2021Updated 4 years ago
jukkakansanaho / udacity-dend-project-3
View on GitHub
Udacity Data Engineer Nano Degree - Project-3 (Data Warehouse)
☆22Jun 20, 2019Updated 7 years ago
satishrath185 / Product-Recommendation
View on GitHub
Product Recommender System for Retail Dataset
☆16Sep 22, 2020Updated 5 years ago
ankitbansal6 / netflix_data_cleaning_analysis
View on GitHub
☆16May 14, 2024Updated 2 years ago
miztiik / s3-to-rds-with-glue
View on GitHub
Extract, transform, and load data for analytic processing using AWS Glue
☆17May 2, 2021Updated 5 years ago
aws-samples / aws-glue-job-status-email-report
View on GitHub
☆18Aug 15, 2022Updated 3 years ago
renatootescu / ETL-pipeline
View on GitHub
Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.
☆351Jan 12, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
manuel-lang / Data-Engineering-Nanodegree
View on GitHub
Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…
☆58Oct 20, 2022Updated 3 years ago
emrspecialistsamer / aws-glue-workshop
View on GitHub
Repository for AWS Glue Workshop
☆32Jan 4, 2023Updated 3 years ago
pregismond / python-project-for-data-engineering
View on GitHub
Acquiring and processing information on world's largest banks
☆19Apr 19, 2026Updated 3 months ago
iam-mhaseeb / Skytrax-Data-Warehouse
View on GitHub
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …
☆138Apr 18, 2020Updated 6 years ago
shravan-kuchkula / udacity-data-eng-proj2
View on GitHub
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…
☆24Nov 22, 2021Updated 4 years ago
alanchn31 / Data-Engineering-Projects
View on GitHub
Personal Data Engineering Projects
☆1,024Feb 8, 2023Updated 3 years ago
arverma / TowardsDataEngineering
View on GitHub
This repo contains commands that data engineers use in day to day work.
☆64Feb 4, 2023Updated 3 years ago
im-nsk / Building-an-Automated-Weather-Data-Pipeline-with-Airflow-From-Ingestion-to-Data-Warehouse
View on GitHub
This project focuses on building a robust data pipeline using Apache Airflow to automate the ingestion of weather data from the OpenWeath…
☆22Feb 3, 2026Updated 5 months ago
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
aravinthsci / Spark_Delta_Lake
View on GitHub
Delta Lake Examples
☆11Apr 24, 2020Updated 6 years ago
san089 / Udacity-Data-Engineering-Projects
View on GitHub
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…
☆1,953Aug 26, 2022Updated 3 years ago
Pushkr / Apache-Spark-Hands-On
View on GitHub
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
☆87Jan 22, 2019Updated 7 years ago
thuanyvermelho / projeto_gcp_batch_dataflow_bigquery
View on GitHub
ELT dos voos da ANAC, utilizando Dataflow com Apache Beam e BigQuery
☆14May 23, 2024Updated 2 years ago
dem108 / AMLWorkshop-IotEdge-DevOps
View on GitHub
This repo has some proposed agenda for Azure Machine Learning related hands-on workshops.
☆11Feb 2, 2021Updated 5 years ago
patelatharva / Data_Pipelines_with_Apache_Airflow
View on GitHub
Creating Data Pipelines with Apache Airflow to manage ETL from Amazon S3 into Amazon Redshift
☆14Jun 12, 2019Updated 7 years ago
darenasc / data-science-for-good
View on GitHub
Data Science for Good links.
☆14Nov 10, 2021Updated 4 years ago
eandbsoftware / AZURE-DP900
View on GitHub
Azure DP 900 notes
☆10Jan 23, 2022Updated 4 years ago
RajenDharmendra / SparkQA
View on GitHub
Apache Spark Interview Question and Answers
☆21Oct 13, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
techmonad / spark-data-pipeline
View on GitHub
This project describes how to write full ETL data pipeline using spark.
☆15Oct 15, 2022Updated 3 years ago
greole / foamMon
View on GitHub
A simple tool for monitoring the progress of OpenFOAM simulations
☆13Nov 9, 2018Updated 7 years ago
hackersandslackers / redis-python-tutorial
View on GitHub
Leverage in-memory data storage to make your Python apps snappy.
☆22Jul 10, 2026Updated last week
aws-samples / severless-ticket-sentiment-analysis-and-automated-escalation
View on GitHub
This application "listens" for a ticket creation event from Zendesk, analyses the ticket for negative sentiment, tags the ticket accordin…
☆14Mar 10, 2025Updated last year
roenby / blockMesh
View on GitHub
Matlab toolbox for generating block structured hex meshes in the polyMesh file format of OpenFOAM.
☆14Jan 2, 2013Updated 13 years ago
atbaker / five-ways-to-deploy
View on GitHub
The source code for my PyCon 2017 talk "5 ways to deploy you Python web app in 2017"
☆10May 19, 2017Updated 9 years ago
ksdiwe / Youtube-Data-Scrapping-End-to-End-Project
View on GitHub
☆12Jan 2, 2024Updated 2 years ago