guidok91 / spark-structured-streaming-kafka
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
☆13Updated last week
Related projects ⓘ
Alternatives and complementary repositories for spark-structured-streaming-kafka
- Spark data pipeline that processes movie ratings data.☆27Updated last week
- Delta Lake Documentation☆46Updated 5 months ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆44Updated last year
- A repository of sample code to show data quality checking best practices using Airflow.☆72Updated last year
- Delta-Lake, ETL, Spark, Airflow☆44Updated 2 years ago
- Demo for GitHub Universe 2022☆12Updated last year
- Simple stream processing pipeline☆92Updated 5 months ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆21Updated 2 years ago
- Docker Airflow - Contains a docker compose file for Airflow 2.0☆59Updated 2 years ago
- Apache Flink/Apache Kafka streaming data analytics demonstration using Streaming Synthetic Sales Data Generator☆11Updated 5 months ago
- Apache Flink (Pyflink) and Related Projects☆29Updated 5 months ago
- Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch …☆51Updated last week
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆83Updated 6 months ago
- ☆60Updated last week
- Sample project to demonstrate data engineering best practices☆166Updated 8 months ago
- Delta Lake examples☆207Updated last month
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆195Updated this week
- Building a Data Pipeline with an Open Source Stack☆38Updated 4 months ago
- ☆15Updated 9 months ago
- Code snippets for Data Engineering Design Patterns book☆40Updated last week
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Example of how to leverage Apache Spark distributed capabilities to call REST-API using a UDF☆50Updated 2 years ago
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆24Updated 7 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆62Updated last month
- Quick Guides from Dremio on Several topics☆65Updated 3 weeks ago
- This project shows how to capture changes from postgres database and stream them into kafka☆31Updated 6 months ago
- Course Material☆22Updated last year