A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
☆23May 14, 2022Updated 3 years ago
Alternatives and similar repositories for aws-data-pipeline
Users that are interested in aws-data-pipeline are comparing it to the libraries listed below
Sorting:
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…☆25Dec 7, 2022Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- ☆16Feb 20, 2026Updated 2 weeks ago
- ☆14Sep 14, 2021Updated 4 years ago
- CSS & HTML on Python Easily☆11Sep 23, 2024Updated last year
- This project aims to build a traveling recommendation application using Google Places API and OpenAI LLM.☆11Mar 19, 2024Updated last year
- The best Python package for comparing two dataframes☆11Dec 29, 2021Updated 4 years ago
- pytest plugin extending allure behaviour☆13Feb 8, 2026Updated 3 weeks ago
- PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …☆12Sep 5, 2023Updated 2 years ago
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 3 years ago
- Python utility to extract differences between two pandas dataframes.☆11Apr 8, 2025Updated 10 months ago
- Configuration system geared towards Python ML projects☆11Apr 30, 2023Updated 2 years ago
- My applied big data analytic project with pyspark.☆10Sep 21, 2022Updated 3 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- 20 python libs and more: read me first!☆12Apr 11, 2024Updated last year
- Automatically perform exploratory data analysis, and generate a report in Word '.docx' format.☆10Feb 11, 2026Updated 3 weeks ago
- Analyse Spotify playlists, albums and artists.☆35Nov 15, 2022Updated 3 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆10Jun 6, 2021Updated 4 years ago
- This repository has a tool and an API for Saudi CERT alerts. Its goal is to help improve the level of cybersecurity awareness in Saudi Ar…☆13Nov 16, 2023Updated 2 years ago
- Self-exploratory Streamlit app to know more about palmer penguins.☆11Jun 26, 2023Updated 2 years ago
- A collection of python utility functions☆11Feb 11, 2026Updated 3 weeks ago
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- Interactive Graphic for Exploring Liver Function Data in Clinical Trials