guidok91 / spark-structured-streaming-kafkaLinks
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
☆13Updated last week
Alternatives and similar repositories for spark-structured-streaming-kafka
Users that are interested in spark-structured-streaming-kafka are comparing it to the libraries listed below
Sorting:
- Course notes for the Astronomer Certification DAG Authoring for Apache Airflow☆53Updated last year
- Spark data pipeline that processes movie ratings data.☆28Updated last week
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆58Updated last year
- Code snippets for Data Engineering Design Patterns book☆119Updated 3 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- A skeleton project for testing Airflow code☆20Updated 3 years ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆70Updated 9 months ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆33Updated 4 years ago
- Materials for the next course☆24Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆258Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 4 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆46Updated last year
- Simple stream processing pipeline☆102Updated last year
- ☆21Updated 3 months ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆149Updated last week
- Code for dbt tutorial☆156Updated 3 weeks ago
- Airflow training for the crunch conf☆105Updated 6 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆174Updated last year
- A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive M…☆46Updated 6 months ago
- Apache Flink (Pyflink) and Related Projects☆39Updated 2 months ago
- Spark on Kubernetes using Helm☆34Updated 5 years ago
- ☆50Updated 4 years ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆85Updated last year
- Public source code for the Udemy online course Apache Airflow: Complete Hands-On Beginner to Advanced Class.☆63Updated 4 years ago
- End to end data engineering project☆56Updated 2 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆83Updated last week
- Near real time ETL to populate a dashboard.☆72Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆72Updated last year
- Enforce Best Practices for all your Airflow DAGs. ⭐☆102Updated this week