An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
☆316Feb 14, 2025Updated last year
Alternatives and similar repositories for e2e-data-engineering
Users that are interested in e2e-data-engineering are comparing it to the libraries listed below
Sorting:
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆45Dec 11, 2023Updated 2 years ago
- This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with …☆15Dec 27, 2023Updated 2 years ago
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- This project shows how to capture changes from postgres database and stream them into kafka☆41May 17, 2024Updated last year
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆25Jan 26, 2024Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆39Dec 18, 2023Updated 2 years ago
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆206Oct 23, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆48Dec 4, 2023Updated 2 years ago
- An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlD…☆16Sep 19, 2023Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…