End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆21Jul 26, 2024Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆48Dec 11, 2023Updated 2 years ago
- Fully dockerized Data Warehouse (DWH) using Airflow, dbt, PostgreSQL and dashboard using redash☆26Nov 12, 2022Updated 3 years ago
- Apache Airflow advanced functionalities examples☆21Mar 22, 2024Updated 2 years ago
- End-to-End BI & DW project: Data Warehousing design and modeling (MySQL), ETL (PDI) and Dashboard (Tableau)☆17Aug 10, 2020Updated 5 years ago
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆22May 11, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- ☆12Mar 6, 2021Updated 5 years ago
- ☆13Sep 15, 2024Updated last year
- A curated list of awesome Python frameworks, libraries, software and resources☆15Jun 6, 2018Updated 8 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆44Feb 9, 2026Updated 4 months ago
- NSCollectionView sample for OS X 10.11 ElCapitan☆12Nov 24, 2017Updated 8 years ago
- Underlying package for the 10-line cta☆15Updated this week
- ☆10Aug 20, 2024Updated last year
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆13Sep 23, 2023Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆45Jan 4, 2024Updated 2 years ago
- Modern GIS Web Client for JavaScript, based on MapboxGL-JS, OpenLayers, Leaflet☆13Sep 16, 2022Updated 3 years ago
- ☆23Jul 8, 2025Updated 11 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆114Jan 8, 2026Updated 5 months ago
- ☆16Feb 11, 2026Updated 4 months ago
- View data on a tile38 server☆14Aug 18, 2024Updated last year
- ☆17Nov 27, 2025Updated 6 months ago
- An example of a project generated with cookiecutter-uv☆16Apr 10, 2026Updated 2 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆65Jul 21, 2023Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆12Nov 18, 2023Updated 2 years ago
- ☆22Mar 15, 2011Updated 15 years ago
- 🚀 A simple javascript template for rapid development of GitHub actions.☆17Feb 24, 2023Updated 3 years ago
- DuckDB Copilot Extension☆10Jan 12, 2026Updated 5 months ago
- Transformer Conformal Prediction for Time Series☆18Apr 13, 2026Updated 2 months ago
- I will share DSA notes and code here☆19Mar 24, 2023Updated 3 years ago
- Stock Advisor☆12Jun 13, 2025Updated last year
- ☆11Feb 7, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆24May 14, 2022Updated 4 years ago
- A Proxy service using FastAPI and Protocol Buffers (Proto3)☆13Jun 17, 2023Updated 2 years ago
- This repo demonstrates an Apache Arrow Flight server implementation in Kubernetes.☆12Oct 25, 2024Updated last year
- Sample for using the Elasticsearch Vector Tiles Search API☆11Apr 29, 2022Updated 4 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆132Jan 3, 2023Updated 3 years ago
- Get map value via dot-delimited path or nil.☆30Sep 9, 2014Updated 11 years ago
- A package that can be used to generate vector tiles using NTS.☆14Nov 22, 2022Updated 3 years ago