zekeriyyaa / Apache-Spark-Structured-Streaming-Via-Docker-Compose
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Apache-Spark-Structured-Streaming-Via-Docker-Compose
- Simplifying Data Engineering and Analytics with Delta, published by Packt☆21Updated last year
- code snippet for analytics sessions☆33Updated 2 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- ☆86Updated 2 years ago
- Airflow helm chart for AWS EKS☆18Updated 3 years ago
- ☆14Updated 5 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- ☆24Updated last year
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Updated 4 years ago
- ☆111Updated 4 years ago
- ☆26Updated 4 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 4 years ago
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆14Updated last year
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.☆40Updated last year
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Updated last year
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆48Updated last year
- Data Engineering on GCP☆30Updated 2 years ago
- GitHub repository related to the course Mastering Elastic Map Reduce for Data Engineers☆23Updated 2 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆28Updated last year
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆42Updated last year
- ☆19Updated 6 years ago
- ☆38Updated 4 months ago
- ☆21Updated last year
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆26Updated 2 years ago
- Serverless Analytics with Amazon Athena, published by packt☆25Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- ☆32Updated 6 months ago
- Data Engineering with Spark and Delta Lake☆89Updated last year
- Stream processing with Azure Databricks☆132Updated last week