ismaildawoodjee / aws-data-pipelineView external linksLinks
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
☆23May 14, 2022Updated 3 years ago
Alternatives and similar repositories for aws-data-pipeline
Users that are interested in aws-data-pipeline are comparing it to the libraries listed below
Sorting:
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆29May 2, 2023Updated 2 years ago
- This project involves an ETL (Extract, Transform, Load) process to analyze sleep data exported from Apple Health☆29Apr 29, 2023Updated 2 years ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆25Aug 30, 2022Updated 3 years ago
- Stream/batch system with Hadoop, Spark on NYC taxi data | #DE☆26Sep 27, 2025Updated 4 months ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- This project aims to build a traveling recommendation application using Google Places API and OpenAI LLM.☆11Mar 19, 2024Updated last year
- ☆14Sep 14, 2021Updated 4 years ago
- pytest plugin extending allure behaviour☆12Feb 1, 2026Updated last week
- PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …☆12Sep 5, 2023Updated 2 years ago
- ☆16Oct 8, 2025Updated 4 months ago
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 3 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- Analyse Spotify playlists, albums and artists.☆35Nov 15, 2022Updated 3 years ago
- 20 python libs and more: read me first!☆12Apr 11, 2024Updated last year
- Automatically perform exploratory data analysis, and generate a report in Word '.docx' format.☆10Jan 8, 2026Updated last month
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆10Jun 6, 2021Updated 4 years ago
- Rasa Chatbot using Django backend and Sockets for communication☆12Dec 8, 2022Updated 3 years ago
- A collection of python utility functions☆11Updated this week
- Solved data engineering exercises using Pyspark☆15Aug 2, 2021Updated 4 years ago
- Documentation on using the built-in Python debugger, PDB.☆23Dec 8, 2022Updated 3 years ago
- Integration of Clinical Embeddings with Neural ODEs☆11Jan 6, 2025Updated last year
- Firefox extension that shows parquet schema when going over GCP cloud storage. Use DuckDB WASM☆12Jan 19, 2024Updated 2 years ago
- Extension to Python-Markdown to translate pydantic's model fields to markdown table☆12Apr 19, 2024Updated last year
- Code for 'Contrastive Multi-Document Question Generation'☆11Oct 16, 2022Updated 3 years ago
- Interactive Graphic for Exploring Liver Function Data in Clinical Trials☆11Mar 4, 2023Updated 2 years ago
- You can use this code to Train on Any Font Style of English Alphabets and Numbers, This code is so powerful when it comes to extract Text…☆10Apr 26, 2021Updated 4 years ago
- The repository includes detailed steps to get data from GES DISC, convert HDF5 files to CSV and plotting geographic data.☆11Aug 17, 2020Updated 5 years ago
- This repository has a tool and an API for Saudi CERT alerts. Its goal is to help improve the level of cybersecurity awareness in Saudi Ar…☆13Nov 16, 2023Updated 2 years ago
- An app that makes it easy to connect to a user's data warehouse and make a dashboard out of it.☆15Feb 6, 2022Updated 4 years ago
- Acquiring and processing information on world's largest banks☆17Jun 17, 2025Updated 7 months ago
- This is a simple script that parses python files in a directory and generates a mxfile containing a diagramm of classes, attributes and m…☆11Feb 23, 2023Updated 2 years ago
- The project focuses on the drowsiness of IT employees, drivers, pilots, crane operators, student etc. These people need a system which ca…☆14Sep 13, 2018Updated 7 years ago
- ☆11Aug 11, 2022Updated 3 years ago
- AWS S3 plugin for dvc☆13Jan 26, 2026Updated 2 weeks ago
- Supercharged pandas indexing☆11Mar 28, 2021Updated 4 years ago
- A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.☆11Jul 4, 2021Updated 4 years ago
- 🗺️ An interactive scratch off map. Keep track of which places you have been, how much of the world you have conquered, and where to go n…☆11Jan 27, 2026Updated 2 weeks ago