PierreKieffer / docker-spark-yarn-cluster
Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn
β50Updated 4 years ago
Alternatives and similar repositories for docker-spark-yarn-cluster:
Users that are interested in docker-spark-yarn-cluster are comparing it to the libraries listed below
- Base Docker image with just essentials: Hadoop, Hive and Spark.β68Updated 3 years ago
- A simple Spark-powered ETL framework that just works πΊβ178Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productiveβ186Updated last year
- Examples of Spark 3.0β47Updated 4 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Aβ¦β118Updated last month
- Custom state store providers for Apache Sparkβ92Updated 2 years ago
- DataQuality for BigDataβ143Updated last year
- Example for article Running Spark 3 with standalone Hive Metastore 3.0β97Updated last year
- The Internals of Spark on Kubernetesβ70Updated 2 years ago
- Spline agent for Apache Sparkβ191Updated last week
- Sample processing code using Spark 2.1+ and Scalaβ51Updated 4 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesβ114Updated this week
- How to manage Slowly Changing Dimensions with Apache Hiveβ55Updated 5 years ago
- β71Updated 3 years ago
- Flowchart for debugging Spark applicationsβ104Updated 3 months ago
- Build configuration-driven ETL pipelines on Apache Sparkβ159Updated 2 years ago
- Filling in the Spark function gaps across APIsβ50Updated 3 years ago
- Spark structured streaming examples with using of version 3.5.1β26Updated 8 months ago
- A library that provides useful extensions to Apache Spark and PySpark.β205Updated last month
- Spark Structured Streaming / Kafka / Cassandra / Elasticβ183Updated last year
- β63Updated 5 years ago
- The Internals of Delta Lakeβ183Updated this week
- ACID Data Source for Apache Spark based on Hive ACIDβ97Updated 3 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizerβ25Updated 2 weeks ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are inβ¦β86Updated 9 months ago
- Visualize column-level data lineage in Spark SQLβ88Updated 2 years ago
- Code snippets used in demos recorded for the blog.β29Updated this week
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.β161Updated 3 years ago
- The iterative broadcast join example code.β69Updated 7 years ago
- Examples of Spark 2.0β211Updated 3 years ago