jkoth / Data-Lake-with-Spark-and-AWS-S3Links

Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster

☆9

Alternatives and similar repositories for Data-Lake-with-Spark-and-AWS-S3

Users that are interested in Data-Lake-with-Spark-and-AWS-S3 are comparing it to the libraries listed below

Sorting:

shravan-kuchkula / udacity-data-eng-proj4
Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …
☆16Updated 5 years ago
ajupton / big-data-engineering-project
Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR
☆84Updated 5 years ago
shravan-kuchkula / udacity-data-eng-proj2
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…
☆24Updated 3 years ago
raveendratal / ravi_azureadbadf
Ravi Azure ADB ADF Repository
☆66Updated 4 months ago
arverma / TowardsDataEngineering
This repo contains commands that data engineers use in day to day work.
☆61Updated 2 years ago
vim89 / datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…
☆55Updated 2 years ago
itversity / data-engineering-spark
☆87Updated 2 years ago
alanchn31 / Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
☆146Updated 4 years ago
supratim94336 / DataEngineeringCapstoneProject
😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS
☆46Updated 5 years ago
patelatharva / Data_Pipelines_with_Apache_Airflow
Creating Data Pipelines with Apache Airflow to manage ETL from Amazon S3 into Amazon Redshift
☆14Updated 5 years ago
hyunjoonbok / PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
☆103Updated 4 years ago
dgadiraju / retail_db
☆53Updated 4 years ago
nareshk1290 / Udacity-Data-Engineering
Udacity Data Engineering Nano Degree (DEND)
☆185Updated 5 years ago
vivek-bombatkar / Spark-with-Python---My-learning-notes-
ETL pipeline using pyspark (Spark - Python)
☆117Updated 5 years ago
JoseRFJuniorLLMs / PySpark-ETL
PySpark-ETL
☆23Updated 5 years ago
sankamuk / PysparkCheatsheet
PySpark Cheatsheet
☆36Updated 2 years ago
johnny-chivers / aws-data-engineering
Resources for the free AWS Data Engineering course on youtube
☆100Updated 3 years ago
damklis / etljob
Simple ETL pipeline using Python
☆26Updated 2 years ago
LearningJournal / Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
☆121Updated last year
AnandDedha / aws-airflow-dataengineering-pipeline
☆21Updated last year
BenSchr / Udacity-Data-Engineering-Projects
My solutions for the Udacity Data Engineering Nanodegree
☆34Updated 5 years ago
Modingwa / Data-Engineering-Capstone-Project
Udacity Data Engineering Nanodegree Capstone Project
☆36Updated 5 years ago
AuFeld / Data_Engineering_Projects
A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…
☆15Updated 4 years ago
Saurav3218 / Pyspark_Questions_SKS
This repo is mostly created for pyspark and hive related interview questions.
☆47Updated 3 years ago
SatadruMukherjee / Data-Preprocessing-Models
☆64Updated 2 weeks ago
jamesbyars / apache-spark-etl-pipeline-example
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…
☆24Updated last year
HoracioSoldman / batch-processing-on-aws
With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…
☆14Updated 3 years ago
ddgope / Data-Pipelines-with-Airflow
This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…
☆90Updated 5 years ago
shravan-kuchkula / udacity-data-eng-proj-1
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…
☆90Updated 3 years ago
itversity / spark-sql-and-pyspark-using-python3
Repository related to Spark SQL and Pyspark using Python3
☆38Updated 2 years ago