akarce / e2e-structured-streamingLinks
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆20Updated 11 months ago
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- End to end data engineering project☆56Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆41Updated last year
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆96Updated 3 months ago
- Project for "Data pipeline design patterns" blog.☆45Updated 10 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆72Updated last year
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆29Updated last year
- build dw with dbt☆46Updated 8 months ago
- Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…☆68Updated last year
- Tiểu Luận Chuyên Ngành☆16Updated 11 months ago
- Realtime Data Engineering Project☆31Updated 5 months ago
- Local Environment to Practice Data Engineering☆142Updated 5 months ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆41Updated 8 months ago
- Code for dbt tutorial☆156Updated 3 weeks ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆78Updated last year
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆62Updated last year
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆14Updated 3 years ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆17Updated last month
- A custom end-to-end analytics platform for customer churn☆12Updated last month
- Code snippets for Data Engineering Design Patterns book☆119Updated 3 months ago
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆140Updated last year
- ☆41Updated 11 months ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆265Updated 11 months ago
- An example project demonstrating data engineering workflow for tutorial purposes by Pipeline To Insights.☆10Updated 2 months ago
- Sample project to demonstrate data engineering best practices☆194Updated last year
- ☆28Updated last year
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆28Updated last year
- Near real time ETL to populate a dashboard.☆72Updated last year
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆53Updated last year
- Code for "Advanced data transformations in SQL" free live workshop☆82Updated last month