Wittline / apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
☆43Updated 2 years ago
Alternatives and similar repositories for apache-spark-docker:
Users that are interested in apache-spark-docker are comparing it to the libraries listed below
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Data Engineering with Spark and Delta Lake☆94Updated 2 years ago
- This repository contains code for Spark Streaming☆21Updated 3 years ago
- PySpark Cheatsheet☆36Updated 2 years ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- Magic to help Spark pipelines upgrade☆34Updated 4 months ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 3 years ago
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Presto Trino with Apache Hive Postgres metastore☆38Updated 4 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated last year
- Dockerizing and Consuming an Apache Livy environment☆11Updated 2 years ago
- ☆23Updated 4 years ago
- Delta Lake examples☆214Updated 3 months ago
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆119Updated 3 years ago
- Quick Guides from Dremio on Several topics☆67Updated last week
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
- A course by DataTalks Club that covers Spark, Kafka, Docker, Airflow, Terraform, DBT, Big Query etc☆14Updated 2 years ago
- Spark data pipeline that processes movie ratings data.☆27Updated this week
- ETL pipeline using pyspark (Spark - Python)☆112Updated 4 years ago
- ☆14Updated 5 years ago
- ☆40Updated 6 months ago
- Simple stream processing pipeline☆96Updated 7 months ago
- PySpark-ETL☆23Updated 5 years ago
- Materials for the next course☆24Updated last year
- Build and run Spark Structured Streaming pipelines in Hadoop - project using PySpark.☆12Updated 5 years ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆26Updated 2 years ago