airscholar / e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
☆223Updated last year
Alternatives and similar repositories for e2e-data-engineering:
Users that are interested in e2e-data-engineering are comparing it to the libraries listed below
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆108Updated last year
- Data Engineering YouTube Analysis Project by Darshil Parmar☆172Updated last year
- ☆262Updated 5 months ago
- YouTube tutorial project☆99Updated last year
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆317Updated last year
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆76Updated 5 months ago
- ☆145Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆34Updated last year
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆135Updated 4 years ago
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆131Updated last year
- Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch …☆55Updated this week
- Sample project to demonstrate data engineering best practices☆175Updated 11 months ago
- ☆130Updated last year
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆254Updated 6 months ago
- Code for "Efficient Data Processing in Spark" Course☆272Updated 3 months ago
- Nyc_Taxi_Data_Pipeline - DE Project☆92Updated 3 months ago
- Welcome to my data engineering projects repository! Here you will find a collection of data engineering projects that I have worked on.☆16Updated last year
- This repository will contain all of the resources for the Mage component of the Data Engineering Zoomcamp: https://github.com/DataTalksCl…☆98Updated 5 months ago
- This is a template you can use for your next data engineering portfolio project.☆174Updated 3 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆35Updated 10 months ago
- This repo contains "Databricks Certified Data Engineer Associate" Questions and related docs.☆110Updated 5 months ago
- This repo contains all the code used in the Python for Data Engineering Course☆243Updated 9 months ago
- ☆190Updated last year
- Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard. The dashboa…☆215Updated 2 years ago
- Realtime Data Engineering Project☆27Updated 2 weeks ago
- Sample repo for startdataengineering DE 101 free course☆45Updated 7 months ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆32Updated last year
- Roadmap for Data Engineering☆218Updated 7 months ago
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆24Updated last year
- Projects done in the Data Engineer Nanodegree Program by Udacity.com☆105Updated 2 years ago