Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.
☆10Jul 12, 2021Updated 4 years ago
Alternatives and similar repositories for Batch-ETL-with-AWS-EMR-and-MWAA
Users that are interested in Batch-ETL-with-AWS-EMR-and-MWAA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project aims to rate football players using data and statistics recorded from the last match they participated in. Much of the code …☆12Nov 22, 2021Updated 4 years ago
- Spark data pipeline that processes movie ratings data.☆31Mar 1, 2026Updated 3 weeks ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Aug 8, 2020Updated 5 years ago
- ☆22Jul 29, 2024Updated last year
- ☆34Feb 19, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Solution to Data at ANZ virtual internship on Forage☆10May 30, 2021Updated 4 years ago
- Alzheimer’s Disease (AD) is a neurological brain disorder marked by dementia and neurological dysfunction that affects memory, behavioral…☆16Aug 28, 2022Updated 3 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 7 years ago
- Image Segmentation using Fully Convolutional Networks in PyTorch☆11May 16, 2019Updated 6 years ago
- This project focuses on building a robust data pipeline using Apache Airflow to automate the ingestion of weather data from the OpenWeath…☆22Feb 3, 2026Updated last month
- Delta Live Tables Workshop Resources☆17Feb 24, 2023Updated 3 years ago
- Alzheimer's / dementia progression classifier for MRIs using CNNs and transfer learning☆18Jan 22, 2018Updated 8 years ago
- Repository created to host udacity data engineer exercises☆11Mar 1, 2026Updated 3 weeks ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆21Jan 13, 2024Updated 2 years ago
- 🚚 ETL for Spark and Airflow☆25Mar 19, 2018Updated 8 years ago
- Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Net…☆16May 21, 2024Updated last year
- Power Pop Health is a collection of content intended to simplify the process of ingesting and prepping Healthcare Open Data using Azure d…☆18May 23, 2022Updated 3 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- This is a simple ETL using Airflow. First, we fetch data from API (extract). Then, we drop unused columns, convert to CSV, and validate (…☆24Oct 12, 2019Updated 6 years ago
- A pipeline to CI/CD of a machine learning model on Google Cloud Run☆32May 1, 2023Updated 2 years ago
- Repositorio con info basica de archivos y documentos para la visualización de datos en fútbol y Python☆71May 26, 2025Updated 10 months ago
- Tutorials of data science concepts and packages in Python☆21Feb 11, 2016Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- Image Classification with transfer learning | a PyTorch Tutorial to Transfer Learning☆22Jul 25, 2024Updated last year
- ☆27Mar 7, 2022Updated 4 years ago
- Tweepy Stream Example☆19Apr 23, 2019Updated 6 years ago
- Databricks CI/CD using Azure DevOps☆21Nov 1, 2022Updated 3 years ago
- CodeSignal CodeFights SQL Database queries☆23Dec 26, 2019Updated 6 years ago
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Dec 3, 2020Updated 5 years ago
- Football Data Processing & Visualization☆86Jun 1, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ITMD - 526 Data Warehousing☆29May 9, 2016Updated 9 years ago
- Data and regressions on Premier League teams from 2000-01 through to 2016-17☆11Jul 31, 2017Updated 8 years ago
- Implemeting Meta AI's VGGT as a FiftyOne Remote Zoo Model☆20Jun 20, 2025Updated 9 months ago
- Refactor your code with local LLM in VSCode☆13Mar 14, 2024Updated 2 years ago
- Project evaluating player skill under pressure using Statsbomb public event level data☆11Jun 22, 2019Updated 6 years ago
- This is a capstone project that entails building an end-to-end ETL (Extract-Transform-Load) Data pipeline which extracts UK accident and …☆18Jun 6, 2020Updated 5 years ago
- In this project I used apache airflow to scrape website periodically. This is for the tutorials I do on youtube.☆10Nov 21, 2022Updated 3 years ago