Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR
☆90Jul 17, 2019Updated 6 years ago
Alternatives and similar repositories for big-data-engineering-project
Users that are interested in big-data-engineering-project are comparing it to the libraries listed below
Sorting:
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆51Aug 23, 2019Updated 6 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 7 years ago
- My solutions for the Udacity Data Engineering Nanodegree☆34Oct 14, 2019Updated 6 years ago
- Udacity Data Engineering Nanodegree Capstone Project☆37May 9, 2020Updated 5 years ago
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆15Jan 4, 2026Updated 2 months ago
- Data Engineering, Data Warehouse, Data Mart, Cloud Data, AWS, SAS, Redshift, S3☆32Feb 2, 2021Updated 5 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 3 years ago
- Example end to end data engineering project.☆1,387Dec 8, 2022Updated 3 years ago
- Udacity Data Engineering Nanodegree Project 3☆12Jul 14, 2019Updated 6 years ago
- ☆39Jan 4, 2026Updated 2 months ago
- Processing TfL data for bike usage with Google Cloud Platform.☆46Jul 15, 2022Updated 3 years ago
- Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…☆1,826Aug 26, 2022Updated 3 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆348Jan 12, 2022Updated 4 years ago
- This repo contains commands that data engineers use in day to day work.☆61Feb 4, 2023Updated 3 years ago
- ☆14Jul 22, 2018Updated 7 years ago
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆16Dec 3, 2018Updated 7 years ago
- Airflow ETL for Meetup API☆45Dec 27, 2018Updated 7 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- ☆27Feb 2, 2018Updated 8 years ago
- Projects done in the Data Engineering Nanodegree by Udacity.com☆272Aug 7, 2019Updated 6 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆14Oct 26, 2021Updated 4 years ago
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆20Aug 5, 2022Updated 3 years ago
- Data Engineering Capstone☆17Oct 10, 2019Updated 6 years ago
- This repository contains my solutions to the top 50 LeetCode SQL challenges implemented using PySpark DataFrame and PySpark SQL.☆28Mar 16, 2024Updated last year
- Classwork projects and home works done through Udacity data engineering nano degree☆76Dec 12, 2023Updated 2 years ago
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆27Apr 12, 2023Updated 2 years ago
- A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation an…☆23Nov 21, 2023Updated 2 years ago
- Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.☆21Jan 30, 2019Updated 7 years ago
- This GitHub project provides a series of lab exercises which help users get started using the Redshift platform.☆53Mar 31, 2021Updated 4 years ago
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆99Aug 11, 2019Updated 6 years ago
- Udacity Data Engineering Nano Degree (DEND)☆190Jan 20, 2020Updated 6 years ago
- A way for home buyers to know about factors affecting a state☆48Mar 2, 2019Updated 7 years ago
- Engineer streaming processing data pipeline on Azure with the main purpose to ingest and process tweets and satellite images data from Hu…☆23Apr 8, 2021Updated 4 years ago