End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆21Jul 26, 2024Updated last year
Alternatives and similar repositories for e2e-structured-streaming
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
Sorting:
- Apache Airflow advanced functionalities examples☆21Mar 22, 2024Updated 2 years ago
- This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The fictici…☆14Sep 30, 2024Updated last year
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆22Mar 6, 2026Updated 2 weeks ago
- SQL Tutorials using Jupyter Notebook☆17Apr 9, 2023Updated 2 years ago
- This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster☆15Sep 9, 2021Updated 4 years ago
- Deep research agentic system using Time Test Diffusion☆45Dec 11, 2025Updated 3 months ago
- This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)☆11Apr 29, 2022Updated 3 years ago
- Cutting-edge, opinionated, and ambitious project builder for power users and researchers.☆16Feb 2, 2026Updated last month
- An Objective-C library for uploading shots to Dribbble.☆13Mar 27, 2012Updated 13 years ago
- A testing ground for Plotly Dash app development including app features and experimenting with dashboard visualizations.☆10Oct 15, 2023Updated 2 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆39Feb 9, 2026Updated last month
- NSCollectionView sample for OS X 10.11 ElCapitan☆12Nov 24, 2017Updated 8 years ago
- ☆67Sep 24, 2025Updated 5 months ago
- ☆10Feb 2, 2024Updated 2 years ago
- A data pipeline moving data from a Relational database system (RDBMS) to a Hadoop file system (HDFS).☆15Jun 3, 2021Updated 4 years ago
- ☆10Jul 19, 2020Updated 5 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- ☆12Sep 23, 2023Updated 2 years ago
- A platform that helps developers to better understand CSS through declaration interpretation and may even improve them through suggestion…☆14Jul 3, 2021Updated 4 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆43Jan 4, 2024Updated 2 years ago
- Modern GIS Web Client for JavaScript, based on MapboxGL-JS, OpenLayers, Leaflet☆14Sep 16, 2022Updated 3 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆110Jan 8, 2026Updated 2 months ago
- TTS utility☆12Aug 2, 2020Updated 5 years ago
- um its my portfolio?☆16Feb 10, 2026Updated last month
- View data on a tile38 server☆14Aug 18, 2024Updated last year
- ☆16Nov 27, 2025Updated 3 months ago
- An example of a project generated with cookiecutter-uv☆15Dec 9, 2025Updated 3 months ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆65Jul 21, 2023Updated 2 years ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆48Oct 14, 2024Updated last year
- [SC2023] POMELO: Fine-grained Population Mapping from Coarse Census Counts and Open Geodata☆13Aug 5, 2024Updated last year
- 🚀 A simple javascript template for rapid development of GitHub actions.☆17Feb 24, 2023Updated 3 years ago
- ☆22Mar 15, 2011Updated 15 years ago
- ☆26Aug 28, 2023Updated 2 years ago
- DuckDB Copilot Extension☆10Jan 12, 2026Updated 2 months ago
- Create and Run 🚀 Dotfiles projects for Windows 10/11☆23Jan 26, 2025Updated last year
- Stock Advisor☆12Jun 13, 2025Updated 9 months ago
- ☆11Feb 7, 2024Updated 2 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- A Proxy service using FastAPI and Protocol Buffers (Proto3)☆13Jun 17, 2023Updated 2 years ago