akarce / e2e-structured-streaming
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆16Updated 8 months ago
Alternatives and similar repositories for e2e-structured-streaming:
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
- End to end data engineering project☆53Updated 2 years ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆27Updated last year
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆66Updated last year
- ☆27Updated last year
- Sample project to demonstrate data engineering best practices☆184Updated last year
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆27Updated last year
- Code for "Advanced data transformations in SQL" free live workshop☆75Updated 5 months ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆18Updated 6 months ago
- A custom end-to-end analytics platform for customer churn☆11Updated 2 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆63Updated last year
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆63Updated 7 months ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆40Updated last year
- ☆40Updated 8 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆86Updated last week
- Building a Data Pipeline with an Open Source Stack☆50Updated 9 months ago
- build dw with dbt☆43Updated 5 months ago
- ☆10Updated 2 years ago
- Ultimate guide for mastering Spark Performance Tuning and Optimization concepts and for preparing for Data Engineering interviews☆114Updated 10 months ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆35Updated last year
- Nyc_Taxi_Data_Pipeline - DE Project☆103Updated 5 months ago
- Local Environment to Practice Data Engineering☆143Updated 3 months ago
- ☆36Updated 2 years ago
- Code for "Efficient Data Processing in Spark" Course☆290Updated 6 months ago
- Near real time ETL to populate a dashboard.☆73Updated 9 months ago
- Project for "Data pipeline design patterns" blog.☆45Updated 7 months ago
- Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…☆67Updated last year
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆109Updated last month
- Data Pipeline from the Global Historical Climatology Network DataSet☆27Updated 2 years ago
- Code for dbt tutorial☆155Updated 10 months ago
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆35Updated 11 months ago