airscholar / e2e-data-engineeringLinks
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
☆268Updated 5 months ago
Alternatives and similar repositories for e2e-data-engineering
Users that are interested in e2e-data-engineering are comparing it to the libraries listed below
Sorting:
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆98Updated 4 months ago
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆150Updated last year
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆354Updated last year
- ☆284Updated 11 months ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆41Updated last year
- Data Engineering YouTube Analysis Project by Darshil Parmar☆200Updated last year
- ☆151Updated 3 years ago
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆141Updated 2 years ago
- YouTube tutorial project☆105Updated last year
- ☆142Updated 2 years ago
- ☆203Updated last year
- Projects done in the Data Engineer Nanodegree Program by Udacity.com☆161Updated 2 years ago
- Ultimate guide for mastering Spark Performance Tuning and Optimization concepts and for preparing for Data Engineering interviews☆153Updated last year
- Python data repo, jupyter notebook, python scripts and data.☆519Updated 7 months ago
- Sample repo for startdataengineering DE 101 free course☆69Updated last year
- Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard. The dashboa…☆232Updated 2 years ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆270Updated last year
- Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…☆68Updated last year
- ☆355Updated 6 months ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆38Updated last year
- This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science t…☆125Updated 6 months ago
- Nyc_Taxi_Data_Pipeline - DE Project☆115Updated 9 months ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆81Updated last year
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆728Updated 3 years ago
- Welcome to my data engineering projects repository! Here you will find a collection of data engineering projects that I have worked on.☆20Updated 2 years ago
- This repository will contain all of the resources for the Mage component of the Data Engineering Zoomcamp: https://github.com/DataTalksCl…☆101Updated 11 months ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆149Updated 5 years ago
- Data Engineering Project with Hadoop HDFS and Kafka☆114Updated last year
- Local Environment to Practice Data Engineering☆143Updated 7 months ago
- ☆354Updated 2 years ago