akarce / e2e-structured-streamingLinks
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆20Updated 11 months ago
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- Code for "Efficient Data Processing in Spark" Course☆323Updated last month
- Code for dbt tutorial☆156Updated last month
- Local Environment to Practice Data Engineering☆143Updated 6 months ago
- End to end data engineering project☆57Updated 2 years ago
- Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake☆198Updated 2 weeks ago
- Sample project to demonstrate data engineering best practices☆194Updated last year
- Code for "Advanced data transformations in SQL" free live workshop☆82Updated 2 months ago
- build dw with dbt☆47Updated 8 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆98Updated 3 months ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆42Updated 9 months ago
- Building a Data Pipeline with an Open Source Stack☆55Updated 2 weeks ago
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆234Updated 5 months ago
- In this repository we store all materials for dlt workshops, courses, etc.☆204Updated last week
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆41Updated last year
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆38Updated last year
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆85Updated 11 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆73Updated last year
- Simple stream processing pipeline☆103Updated last year
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆74Updated last year
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆20Updated this week
- Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…☆68Updated last year
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆268Updated last year
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆29Updated last year
- Tiểu Luận Chuyên Ngành☆17Updated last year
- Dagster University courses☆93Updated this week
- Code snippets for Data Engineering Design Patterns book☆128Updated 3 months ago
- Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped☆36Updated last year
- Slow & local data allows you to move fast and deliver business value for the 99.9% of the data challenges.☆252Updated 3 months ago
- ☆28Updated last year
- ☆134Updated 5 months ago