An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
☆320Feb 14, 2025Updated last year
Alternatives and similar repositories for e2e-data-engineering
Users that are interested in e2e-data-engineering are comparing it to the libraries listed below
Sorting:
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆45Dec 11, 2023Updated 2 years ago
- This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with …☆14Dec 27, 2023Updated 2 years ago
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- This project shows how to capture changes from postgres database and stream them into kafka☆42May 17, 2024Updated last year
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆43Jan 4, 2024Updated 2 years ago
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆25Jan 26, 2024Updated 2 years ago
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆209Oct 23, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆48Dec 4, 2023Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆11Nov 18, 2023Updated 2 years ago
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆145Jul 27, 2023Updated 2 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Step by step instructions to create a production-ready data pipeline☆58Dec 23, 2024Updated last year
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Personal Data Engineering Projects☆1,001Feb 8, 2023Updated 3 years ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆30Oct 25, 2023Updated 2 years ago
- Coursera Lab for MLOps course covers Hugging Face☆12Apr 9, 2023Updated 2 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆56Sep 30, 2023Updated 2 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆110Jan 8, 2026Updated 2 months ago
- Practical Data Engineering: A Hands-On Real-Estate Project Guide☆792Mar 10, 2026Updated last week
- Realtime Data Engineering Project☆30Jan 12, 2025Updated last year
- YouTube tutorial project☆108Oct 17, 2023Updated 2 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆347Jan 12, 2022Updated 4 years ago
- Practice your Pyspark skills!☆103Oct 22, 2021Updated 4 years ago
- Example end to end data engineering project.☆1,394Dec 8, 2022Updated 3 years ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆864Apr 16, 2022Updated 3 years ago
- StreamSoft enables real-time analysis of any stock market☆15Apr 24, 2024Updated last year
- ☆212Aug 13, 2023Updated 2 years ago
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆39,165Mar 12, 2026Updated last week
- ☆215Jan 22, 2025Updated last year
- Repository for Data Engineering Interview Series☆36Oct 17, 2024Updated last year
- A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…☆23Nov 19, 2024Updated last year
- capstone project for Dataengineer.io bootcamp Public Repo☆12Feb 20, 2024Updated 2 years ago
- This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for da…☆20Feb 23, 2024Updated 2 years ago
- ☆16Mar 9, 2026Updated 2 weeks ago
- Deploying a Bulletproof Photo Sharing App with DevSecOps, Terraform, AWS EKS, and Chaos Engineering involves creating a highly secure and…☆25Aug 1, 2024Updated last year
- Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.☆52Feb 3, 2026Updated last month
- Rust And Delta Demo. Explanation and walkthrough on delta-rs☆10Aug 21, 2023Updated 2 years ago