End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆21Jul 26, 2024Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- Apache Airflow advanced functionalities examples☆21Mar 22, 2024Updated last year
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆22Dec 21, 2025Updated 2 months ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆45Dec 11, 2023Updated 2 years ago
- Fully dockerized Data Warehouse (DWH) using Airflow, dbt, PostgreSQL and dashboard using redash☆25Nov 12, 2022Updated 3 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆36Feb 9, 2026Updated 3 weeks ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆30Oct 25, 2023Updated 2 years ago
- This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The fictici…☆14Sep 30, 2024Updated last year
- This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster☆15Sep 9, 2021Updated 4 years ago
- ☆13Sep 15, 2024Updated last year
- ☆10Jan 28, 2025Updated last year
- The Data Product Specification☆11Jan 28, 2025Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆48Oct 14, 2024Updated last year
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- ☆10Nov 26, 2024Updated last year
- [SC2023] POMELO: Fine-grained Population Mapping from Coarse Census Counts and Open Geodata☆13Aug 5, 2024Updated last year
- Official implementation of "Predicting building types using OSM"☆13Feb 8, 2026Updated 3 weeks ago
- Cutting-edge, opinionated, and ambitious project builder for power users and researchers.☆15Feb 2, 2026Updated last month
- A testing ground for Plotly Dash app development including app features and experimenting with dashboard visualizations.☆10Oct 15, 2023Updated 2 years ago
- This repo demonstrates an Apache Arrow Flight server implementation in Kubernetes.☆12Oct 25, 2024Updated last year
- DuckDB Copilot Extension☆10Jan 12, 2026Updated last month
- A lightweight Snowflake emulator built with Go and DuckDB for local development and testing☆24Jan 19, 2026Updated last month
- Modern GIS Web Client for JavaScript, based on MapboxGL-JS, OpenLayers, Leaflet☆14Sep 16, 2022Updated 3 years ago
- A high-performance PDF summarization tool powered by Google's Gemma 3 LLM. Features parallel processing, async operations, and intelligen…☆20Apr 12, 2025Updated 10 months ago
- ☆12Sep 23, 2023Updated 2 years ago
- ☆10May 5, 2022Updated 3 years ago
- Get map value via dot-delimited path or nil.☆30Sep 9, 2014Updated 11 years ago
- ☆10Feb 2, 2024Updated 2 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆108Jan 8, 2026Updated last month
- um its my portfolio?☆16Feb 10, 2026Updated 2 weeks ago
- ☆12May 27, 2025Updated 9 months ago
- Building Data Science Solutions with Anaconda, published by Packt☆18Feb 5, 2026Updated 3 weeks ago
- Kafka Connect: How to create a real time data pipeline using Change Data Capture (CDC)☆13Jan 24, 2021Updated 5 years ago
- ☆14Jul 26, 2022Updated 3 years ago
- ☆12Mar 7, 2025Updated 11 months ago
- This repository is all you need to understand how to build Gen AI products or AI agents☆56Feb 4, 2026Updated 3 weeks ago
- Resources for the GPS spoofing detection project☆12Apr 24, 2023Updated 2 years ago
- Example project for building scalable data pipelines with Kedro and Ibis.☆13Dec 10, 2025Updated 2 months ago
- Stock Advisor☆12Jun 13, 2025Updated 8 months ago