akarce / e2e-structured-streamingLinks
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆20Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆47Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Updated 2 years ago
- End to end data engineering project☆58Updated 3 years ago
- Local Environment to Practice Data Engineering☆144Updated last year
- Code for dbt tutorial☆167Updated 5 months ago
- Sample project to demonstrate data engineering best practices☆202Updated last year
- build dw with dbt☆50Updated last year
- Code for "Efficient Data Processing in Spark" Course☆360Updated 3 months ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆31Updated last year
- A custom end-to-end analytics platform for customer churn☆11Updated 8 months ago
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆22Updated last month
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆40Updated last year
- In this repository we store all materials for dlt workshops, courses, etc.☆248Updated last month
- This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source to…☆39Updated 2 years ago
- Code snippets for Data Engineering Design Patterns book☆331Updated last month
- Dagster University courses☆121Updated last week
- Simple stream processing pipeline☆110Updated last year
- Project for "Data pipeline design patterns" blog.☆50Updated last year
- Building a Data Pipeline with an Open Source Stack☆55Updated 7 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆108Updated last month
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆284Updated last year
- Code for "Advanced data transformations in SQL" free live workshop☆89Updated 9 months ago
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆258Updated last month
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, duckdb and Superset☆46Updated last month
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Updated 5 months ago
- Notebooks to learn Databricks Lakehouse Platform☆40Updated this week
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆131Updated last year
- Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principle…☆124Updated 10 months ago
- 🦆 Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.☆85Updated 3 months ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆98Updated last year