akarce / e2e-structured-streamingLinks

End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.

☆19

Alternatives and similar repositories for e2e-structured-streaming

Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below

Sorting:

josephmachado / simple_dbt_project
Code for dbt tutorial
☆162Updated last month
Armaan1Gohil / dataengineering-tech-stack
Local Environment to Practice Data Engineering
☆141Updated 10 months ago
hnawaz007 / dbt-dw
build dw with dbt
☆47Updated last year
dominikhei / Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…
☆74Updated 2 years ago
josephmachado / data_engineering_best_practices
Sample project to demonstrate data engineering best practices
☆197Updated last year
nydasco / data-pipeline-demo
A demonstration of an ELT (Extract, Load, Transform) pipeline
☆30Updated last year
dagster-io / project-dagster-university
Dagster University courses
☆114Updated last week
cnstlungu / portable-data-stack-dagster
A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset
☆251Updated 3 weeks ago
dlt-hub / dlthub-education
In this repository we store all materials for dlt workshops, courses, etc.
☆233Updated 2 weeks ago
thanhENC / e2e-data-platform
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…
☆46Updated last year
josephmachado / efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
☆345Updated 2 weeks ago
richban / opendata-stack-platform
Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…
☆20Updated last month
raashidsalih / churn-pipeline
A custom end-to-end analytics platform for customer churn
☆11Updated 5 months ago
josephmachado / python_essentials_for_data_engineers
Code for blog at https://www.startdataengineering.com/post/python-for-de/
☆87Updated last year
mattiasthalen / adventure-works
Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principle…
☆119Updated 7 months ago
airscholar / realtime-voting-data-engineering
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…
☆42Updated last year
josephmachado / docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
☆40Updated last year
TJaniF / airflow-elt-blueprint
A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.
☆78Updated 2 years ago
jaceklaskowski / learn-databricks
Notebooks to learn Databricks Lakehouse Platform
☆35Updated last week
HamzaG737 / data-engineering-project
End to end data engineering project with kafka, airflow, spark, postgres and docker.
☆103Updated 7 months ago
alonsomedo / os-data-stack
Building a Data Pipeline with an Open Source Stack
☆54Updated 4 months ago
josephmachado / online_store
End to end data engineering project
☆57Updated 3 years ago
kaoutaar / end-to-end-etl-pipeline-jcdecaux-API
velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…
☆20Updated 2 months ago
l-mds / local-data-stack
Slow & local data allows you to move fast and deliver business value for the 99.9% of the data challenges.
☆311Updated last month
bartosz25 / data-engineering-design-patterns-book
Code snippets for Data Engineering Design Patterns book
☆249Updated 7 months ago
abdkumar / spotify-stream-analytics
Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…
☆69Updated last year
gmyrianthous / dbt-airflow
A Python package that creates fine-grained dbt tasks on Apache Airflow
☆74Updated last week
bruno-szdl / dbt-ci-cd
☆160Updated 2 months ago
josephmachado / beginner_de_project_stream
Simple stream processing pipeline
☆110Updated last year
mattmartin14 / dream_machine
☆72Updated this week