shravan-kuchkula/udacity-data-eng-proj4

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shravan-kuchkula/udacity-data-eng-proj4)

shravan-kuchkula / udacity-data-eng-proj4

Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as a set of dimensional tables. Lake Processing: Spark, Lake Storage: S3

☆17

Alternatives and similar repositories for udacity-data-eng-proj4

Users that are interested in udacity-data-eng-proj4 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shravan-kuchkula / dataEngineering
View on GitHub
A repo to track data engineering projects
☆14Nov 11, 2022Updated 3 years ago
shravan-kuchkula / udacity-data-eng-proj-1
View on GitHub
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…
☆89Nov 22, 2021Updated 4 years ago
san089 / Optimizing-Public-Transportation
View on GitHub
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
☆33Aug 14, 2023Updated 2 years ago
jamesbyars / apache-spark-etl-pipeline-example
View on GitHub
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…
☆24Aug 11, 2023Updated 2 years ago
Prakash-Ponnusamy1 / CCA175_Master_Preparation
View on GitHub
☆19Apr 9, 2020Updated 6 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
datascience-course / 2021-datascience-lectures
View on GitHub
☆13Apr 22, 2021Updated 5 years ago
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
ajupton / big-data-engineering-project
View on GitHub
Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR
☆92Jul 17, 2019Updated 7 years ago
Aaron-K-T-Berry / airflow-docker-boilerplate
View on GitHub
☆11Updated this week
cloudera / cml-training
View on GitHub
Example Python and R code for Cloudera Machine Learning (CML) training
☆14Dec 1, 2020Updated 5 years ago
anthonywong611 / Batch-ETL-with-AWS-EMR-and-MWAA
View on GitHub
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extrac…
☆10Jul 12, 2021Updated 5 years ago
dgadiraju / itversity-boxes
View on GitHub
Repository for all ITVersity Vagrant Boxes.
☆32Apr 23, 2020Updated 6 years ago
dstibrany / SimpleDB
View on GitHub
SimpleDB implementation for MIT 6.830
☆12Nov 15, 2019Updated 6 years ago
codingvarun / streaming-elt-pipeline
View on GitHub
This is a real-life, high throughput streaming ELT data pipeline for ecommerce
☆15May 22, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
saboye / Data-Modeling-with-Postgres
View on GitHub
A project to design a fact and dimension star schema for optimizing queries on a flight booking database using PostgreSQL, a relational d…
☆12Aug 15, 2021Updated 4 years ago
MicrosoftDocs / mslearn-cv-classify-bird-species
View on GitHub
Data and source for Azure Computer Vision classify birds with Python SDK
☆11Jan 20, 2021Updated 5 years ago
latinacode / Wrangle-and-Analyze-Data
View on GitHub
Udacity Data Analyst Nanodegree Project 7 - Wrangle and Analyze WeRateDogs Twitter account.
☆13May 26, 2018Updated 8 years ago
henokyemam / Wrangling_PySpark
View on GitHub
☆12Dec 28, 2020Updated 5 years ago
chandu-muthyala / Data-Engineer-Nano-Degree
View on GitHub
Data models, build data warehouses and data lakes, automate data pipelines, and worked with massive datasets.
☆12Jul 16, 2019Updated 7 years ago
vim89 / datapipelines-essentials-python
View on GitHub
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…
☆56May 6, 2023Updated 3 years ago
fancellu / neo4j-d3v4
View on GitHub
Neo4j 3.x accessed via bolt JS driver, plugged into D3 v4 force simulation
☆18Apr 2, 2017Updated 9 years ago
benawad / baklava
View on GitHub
Trello clone GraphQL Node.js backend
☆11Aug 23, 2017Updated 8 years ago
lp-dataninja / SparkML
View on GitHub
Detailed notes and code to learn machine learning with Apache Spark.
☆12Sep 24, 2018Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ajithshetty / data-engineering-rust-demo
View on GitHub
Rust And Delta Demo. Explanation and walkthrough on delta-rs
☆10Aug 21, 2023Updated 2 years ago
murkenson / movies_tv_shows_data_pipeline
View on GitHub
Final Project for Data Engineering Zoomcamp Course 2024 🧙🔥
☆11Apr 17, 2024Updated 2 years ago
semashkinvg / DataVault
View on GitHub
☆16Jan 20, 2019Updated 7 years ago
jess197 / football_statistics_etl_project
View on GitHub
☆13Dec 28, 2023Updated 2 years ago
alvintoh / udemy-hands-on-hadoop
View on GitHub
AlvinToh Learning Repository for The Ultimate Hands-On Hadoop - Tame your Big Data!
☆10May 23, 2018Updated 8 years ago
ybangaru / wallstreetbets-sentiment-analysis
View on GitHub
☆10May 24, 2021Updated 5 years ago
MuhammadIbtisam / ai-engineer-roadmap
View on GitHub
A complete, hands-on roadmap to becoming an AI Engineer, from Python basics to production RAG, Agents, and Fine-Tuning. 10 modules, 20+ n…
☆24Apr 19, 2026Updated 3 months ago
1rocketdude / pyetrade_option_chains
View on GitHub
exemplar code to download all option chains for a symbol using pyetrade (V1 Etrade API)
☆11Sep 28, 2021Updated 4 years ago
anbento0490 / tutorials
View on GitHub
☆21Jan 21, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
aws-samples / severless-ticket-sentiment-analysis-and-automated-escalation
View on GitHub
This application "listens" for a ticket creation event from Zendesk, analyses the ticket for negative sentiment, tags the ticket accordin…
☆14Mar 10, 2025Updated last year
rsanjabi / short-term-rentals-warehouse
View on GitHub
Pipeline, warehouse, and visualization tools for investigating the impact of Airbnb short-term rentals on world cities.
☆15Jun 9, 2023Updated 3 years ago
koresar / s3-tree
View on GitHub
Generates a tree of an S3 bucket contents
☆12Sep 18, 2020Updated 5 years ago
fpcarneiro / data-engineer-project
View on GitHub
Data Engineering Capstone
☆17Oct 10, 2019Updated 6 years ago
kunalBhashkar / Bank-Marketing-Data-Set-Classification
View on GitHub
Bank Marketing data classification
☆12Oct 2, 2020Updated 5 years ago
JoeReis / context_graph_prototype
View on GitHub
☆29Jan 11, 2026Updated 6 months ago
AdeboyeML / UK_Accident_Traffic_ETL_Pipeline
View on GitHub
This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and …
☆18Jun 6, 2020Updated 6 years ago