akarce / e2e-structured-streamingLinks
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆20Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- Local Environment to Practice Data Engineering☆143Updated 8 months ago
- Code for "Efficient Data Processing in Spark" Course☆336Updated 3 months ago
- End to end data engineering project☆57Updated 2 years ago
- Sample project to demonstrate data engineering best practices☆197Updated last year
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆85Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆44Updated 10 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆74Updated 2 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆102Updated 5 months ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆30Updated last year
- Simple stream processing pipeline☆103Updated last year
- Building a Data Pipeline with an Open Source Stack☆55Updated 2 months ago
- Code for dbt tutorial☆159Updated 3 months ago
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆20Updated last month
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆39Updated last year
- build dw with dbt☆47Updated 10 months ago
- Code snippets for Data Engineering Design Patterns book☆151Updated 5 months ago
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆76Updated 2 years ago
- Code for "Advanced data transformations in SQL" free live workshop☆83Updated 4 months ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Updated 3 weeks ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆274Updated last year
- Delta-Lake, ETL, Spark, Airflow☆48Updated 2 years ago
- A custom end-to-end analytics platform for customer churn☆12Updated 3 months ago
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆105Updated last year
- Project for "Data pipeline design patterns" blog.☆45Updated last year
- Near real time ETL to populate a dashboard.☆72Updated last year
- In this repository we store all materials for dlt workshops, courses, etc.☆221Updated last week
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆42Updated last year
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆34Updated last year
- ☆119Updated last month
- Notebooks to learn Databricks Lakehouse Platform☆35Updated last week