akarce / e2e-structured-streamingLinks
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆20Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- End to end data engineering project☆57Updated 3 years ago
- Local Environment to Practice Data Engineering☆144Updated last year
- Sample project to demonstrate data engineering best practices☆204Updated last year
- build dw with dbt☆50Updated last year
- Code snippets for Data Engineering Design Patterns book☆302Updated 2 weeks ago
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆41Updated last year
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆92Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆47Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Updated 2 years ago
- Project for "Data pipeline design patterns" blog.☆47Updated last year
- Code for dbt tutorial☆165Updated 3 months ago
- A custom end-to-end analytics platform for customer churn☆11Updated 7 months ago
- 🦆 Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.☆82Updated last month
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆37Updated 2 years ago
- Code for "Efficient Data Processing in Spark" Course☆353Updated 2 months ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆43Updated 2 years ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆31Updated last year
- Simple stream processing pipeline☆110Updated last year
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆79Updated 2 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆107Updated 9 months ago
- Notebooks to learn Databricks Lakehouse Platform☆38Updated this week
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Updated 4 months ago
- Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…☆69Updated 2 years ago
- Code for "Advanced data transformations in SQL" free live workshop☆88Updated 7 months ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆281Updated last year
- Building a Data Pipeline with an Open Source Stack☆55Updated 6 months ago
- Dagster University courses☆119Updated this week
- ☆30Updated 2 years ago
- Nyc_Taxi_Data_Pipeline - DE Project☆130Updated last year
- Realtime Data Engineering Project☆30Updated 11 months ago