zekeriyyaa / Apache-Spark-Structured-Streaming-Via-Docker-Compose
☆13Updated last year
Alternatives and similar repositories for Apache-Spark-Structured-Streaming-Via-Docker-Compose:
Users that are interested in Apache-Spark-Structured-Streaming-Via-Docker-Compose are comparing it to the libraries listed below
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Spark data pipeline that processes movie ratings data.☆28Updated 3 weeks ago
- Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.☆40Updated 2 years ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Course Material☆24Updated 2 years ago
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆13Updated 2 years ago
- This repo contains live examples to build Databricks' Lakehouse and recommended best practices from the field.☆18Updated 6 months ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Updated this week
- Airflow helm chart for AWS EKS☆18Updated 4 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated last year
- code snippet for analytics sessions☆34Updated 2 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Updated last year
- ☆64Updated this week
- Simplifying Data Engineering and Analytics with Delta, published by Packt☆21Updated last year
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆49Updated last year
- ☆23Updated 2 years ago
- ☆87Updated 2 years ago
- GitHub repository related to the course Mastering Elastic Map Reduce for Data Engineers☆24Updated 2 years ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆50Updated last year
- Amazon SageMaker Best Practices, published by Packt☆29Updated 2 years ago
- Spark app to merge different schemas☆23Updated 4 years ago
- ☆40Updated 9 months ago
- Data Engineering with AWS Cookbook, published by Packt☆18Updated 4 months ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated 2 years ago
- Simple ETL pipeline using Python☆26Updated last year
- ☆25Updated 4 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 4 years ago
- A tutorial for the Great Expectations library.☆71Updated 4 years ago
- Building Big Data Pipelines with Apache Beam, published by Packt☆86Updated 2 years ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆30Updated 4 years ago