zekeriyyaa / Apache-Spark-Structured-Streaming-Via-Docker-ComposeLinks
☆13Updated last year
Alternatives and similar repositories for Apache-Spark-Structured-Streaming-Via-Docker-Compose
Users that are interested in Apache-Spark-Structured-Streaming-Via-Docker-Compose are comparing it to the libraries listed below
Sorting:
- Data Engineering with Spark and Delta Lake☆104Updated 2 years ago
- Data engineering with dbt, published by Packt☆86Updated last month
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- Productionalizing Data Pipelines with Apache Airflow☆114Updated 3 years ago
- ☆88Updated 3 years ago
- Delta Lake Documentation☆50Updated last year
- Materials for the next course☆25Updated 2 years ago
- GitHub repository related to the course Mastering Elastic Map Reduce for Data Engineers☆24Updated 3 years ago
- ☆16Updated 6 years ago
- Snowflake Cookbook, published by Packt☆81Updated 2 years ago
- Airflow helm chart for AWS EKS☆19Updated 4 years ago
- This repo contains live examples to build Databricks' Lakehouse and recommended best practices from the field.☆22Updated 11 months ago
- ☆117Updated 5 years ago
- Spark data pipeline that processes movie ratings data.☆30Updated last month
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Updated 4 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆182Updated 3 years ago
- Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.☆40Updated 2 years ago
- ☆48Updated 8 months ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆52Updated last year
- Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.☆38Updated 3 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆28Updated 5 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 3 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆134Updated 2 years ago
- Serverless ETL and Analytics with AWS Glue, published by Packt☆51Updated 2 years ago
- ☆11Updated 5 years ago
- Delta Lake examples☆229Updated last year
- EverythingApacheNiFi☆115Updated last year
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆121Updated 4 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 4 years ago