akarce / e2e-structured-streamingLinks
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆19Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆44Updated 11 months ago
- Local Environment to Practice Data Engineering☆141Updated 9 months ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆41Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆74Updated 2 years ago
- End to end data engineering project☆57Updated 2 years ago
- Sample project to demonstrate data engineering best practices☆198Updated last year
- Code for dbt tutorial☆161Updated 3 weeks ago
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆40Updated last year
- Notebooks to learn Databricks Lakehouse Platform☆35Updated last month
- Dagster University courses☆111Updated last week
- Code snippets for Data Engineering Design Patterns book☆207Updated 6 months ago
- Code for "Efficient Data Processing in Spark" Course☆339Updated 4 months ago
- Realtime Data Engineering Project☆30Updated 8 months ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆87Updated last year
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆30Updated last year
- build dw with dbt☆46Updated 11 months ago
- Building a Data Pipeline with an Open Source Stack☆54Updated 3 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆103Updated 6 months ago
- Project for "Data pipeline design patterns" blog.☆46Updated last year
- A custom end-to-end analytics platform for customer churn☆11Updated 4 months ago
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆20Updated last week
- Simple stream processing pipeline☆110Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆245Updated this week
- ☆69Updated this week
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Updated last month
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆78Updated 2 years ago
- ☆40Updated 2 years ago
- Code for "Advanced data transformations in SQL" free live workshop☆84Updated 5 months ago
- In this repository we store all materials for dlt workshops, courses, etc.☆227Updated 3 weeks ago
- Repo for CDC with debezium blog post☆29Updated last year