End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆21Jul 26, 2024Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆47Dec 11, 2023Updated 2 years ago
- Apache Airflow advanced functionalities examples☆21Mar 22, 2024Updated 2 years ago
- End-to-End BI & DW project: Data Warehousing design and modeling (MySQL), ETL (PDI) and Dashboard (Tableau)☆16Aug 10, 2020Updated 5 years ago
- This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The fictici…☆14Sep 30, 2024Updated last year
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆22Mar 29, 2026Updated 2 weeks ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- SQL Tutorials using Jupyter Notebook☆17Apr 9, 2023Updated 3 years ago
- ☆12Mar 6, 2021Updated 5 years ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆31Feb 19, 2024Updated 2 years ago
- ☆13Sep 15, 2024Updated last year
- This repo will guide you step-by-step method to create star schema dimensional model.☆25Jun 1, 2021Updated 4 years ago
- Deep research agentic system using Time Test Diffusion☆45Dec 11, 2025Updated 4 months ago
- A curated list of awesome Python frameworks, libraries, software and resources☆15Jun 6, 2018Updated 7 years ago
- Cutting-edge, opinionated, and ambitious project builder for power users and researchers.☆16Feb 2, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Đồ án tốt nghiệp | Data Lakehouse☆41Feb 9, 2026Updated 2 months ago
- A testing ground for Plotly Dash app development including app features and experimenting with dashboard visualizations.☆10Oct 15, 2023Updated 2 years ago
- ☆10Feb 2, 2024Updated 2 years ago
- A data pipeline moving data from a Relational database system (RDBMS) to a Hadoop file system (HDFS).☆15Jun 3, 2021Updated 4 years ago
- Underlying package for the 10-line cta☆15Updated this week
- ☆11Aug 20, 2024Updated last year
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- ☆12Sep 23, 2023Updated 2 years ago
- It is a assemble to include all Practice Projects about Big Data Topic, includes Hadoop, Spark, Spark Streaming and Kafka☆11Mar 7, 2019Updated 7 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- Modern GIS Web Client for JavaScript, based on MapboxGL-JS, OpenLayers, Leaflet☆14Sep 16, 2022Updated 3 years ago
- ☆23Jul 8, 2025Updated 9 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆110Jan 8, 2026Updated 3 months ago
- ☆16Feb 11, 2026Updated 2 months ago
- View data on a tile38 server☆14Aug 18, 2024Updated last year
- ☆16Nov 27, 2025Updated 4 months ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆65Jul 21, 2023Updated 2 years ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆48Oct 14, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [SC2023] POMELO: Fine-grained Population Mapping from Coarse Census Counts and Open Geodata☆13Aug 5, 2024Updated last year
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆11Nov 18, 2023Updated 2 years ago
- 🚀 A simple javascript template for rapid development of GitHub actions.☆17Feb 24, 2023Updated 3 years ago
- package for snow science data, providing streamlined access to satellite imagery (Sentinel-1/2, HLS, MODIS, etc), weather station data, c…☆12Sep 12, 2025Updated 7 months ago
- code and demo for hierarchical stacking paper☆10May 13, 2021Updated 4 years ago
- This project demonstrates how to integrate DuckLake, SQLMesh, and Neon PostgreSQL to create a modern data lakehouse architecture with ver…☆28Jun 3, 2025Updated 10 months ago
- Create and Run 🚀 Dotfiles projects for Windows 10/11☆23Jan 26, 2025Updated last year