akarce / e2e-structured-streaming
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
☆19Updated 9 months ago
Alternatives and similar repositories for e2e-structured-streaming:
Users that are interested in e2e-structured-streaming are comparing it to the libraries listed below
- End to end data engineering project☆54Updated 2 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆69Updated last year
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆27Updated last year
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆69Updated 8 months ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆93Updated last month
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆29Updated last year
- ☆151Updated 2 years ago
- A custom end-to-end analytics platform for customer churn☆11Updated 3 months ago
- Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…☆16Updated 2 weeks ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆18Updated 7 months ago
- Near real time ETL to populate a dashboard.☆72Updated 10 months ago
- build dw with dbt☆44Updated 6 months ago
- Code for "Advanced data transformations in SQL" free live workshop☆79Updated this week
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆74Updated 11 months ago
- Code for dbt tutorial☆157Updated 11 months ago
- Local Environment to Practice Data Engineering☆143Updated 4 months ago
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆67Updated last year
- Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principle…☆114Updated last month
- ☆28Updated last year
- Simple stream processing pipeline☆102Updated 10 months ago
- Sample project to demonstrate data engineering best practices☆189Updated last year
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆144Updated 4 years ago
- This project shows how to capture changes from postgres database and stream them into kafka☆36Updated 11 months ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆26Updated last year
- Realtime Data Engineering Project☆29Updated 3 months ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆60Updated last year
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆42Updated last year
- ☆30Updated 7 months ago
- Data Engineering examples for Airflow, Prefect; dbt for BigQuery, Redshift, ClickHouse, Postgres, DuckDB; PySpark for Batch processing; K…☆64Updated 2 months ago
- In this repository we store all materials for dlt workshops, courses, etc.☆164Updated 2 weeks ago