An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
☆329Feb 14, 2025Updated last year
Alternatives and similar repositories for e2e-data-engineering
Users that are interested in e2e-data-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆48Dec 11, 2023Updated 2 years ago
- This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with …☆14Dec 27, 2023Updated 2 years ago
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆11Oct 11, 2023Updated 2 years ago
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆39Dec 18, 2023Updated 2 years ago
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆215Oct 23, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Az…☆30Oct 2, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆51Dec 4, 2023Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆12Nov 18, 2023Updated 2 years ago
- Demo of using Airflow☆11Jun 24, 2022Updated 3 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- This project leverages Hadoop, Spark, SQL, and Hive for efficient data integration, transformation, warehousing, and analytics. It provid…☆23Sep 30, 2023Updated 2 years ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Personal Data Engineering Projects☆1,017Feb 8, 2023Updated 3 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆55Sep 30, 2023Updated 2 years ago
- ☆34Nov 25, 2023Updated 2 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆115Jan 8, 2026Updated 5 months ago
- Practical Data Engineering: A Hands-On Real-Estate Project Guide☆803Mar 10, 2026Updated 3 months ago
- Realtime Data Engineering Project☆31Jan 12, 2025Updated last year
- YouTube tutorial project☆108Oct 17, 2023Updated 2 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆349Jan 12, 2022Updated 4 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🥭 Designed and optimized a CNN architecture to accurately detect and classify 7 types of mango leaf diseases, reaching 99.21% test accur…☆14Nov 14, 2023Updated 2 years ago
- Practice your Pyspark skills!☆106Oct 22, 2021Updated 4 years ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆881Apr 16, 2022Updated 4 years ago
- Docker with Airflow and Spark standalone cluster☆264Aug 5, 2023Updated 2 years ago
- ☆215Aug 13, 2023Updated 2 years ago
- An Awesome List of Open-Source Data Engineering Projects☆3,202Oct 4, 2024Updated last year
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆42,113May 3, 2026Updated last month
- ☆216Jan 22, 2025Updated last year
- A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…☆23Nov 19, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Repository for Data Engineering Interview Series☆39Oct 17, 2024Updated last year
- This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for da…☆20Feb 23, 2024Updated 2 years ago
- ☆16Mar 9, 2026Updated 3 months ago
- ☆64Jan 9, 2024Updated 2 years ago
- Rust And Delta Demo. Explanation and walkthrough on delta-rs☆10Aug 21, 2023Updated 2 years ago
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆40Apr 29, 2024Updated 2 years ago
- An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…☆25Dec 7, 2022Updated 3 years ago