cluster-apps-on-docker / spark-standalone-cluster-on-dockerView external linksLinks
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.
☆507Nov 7, 2025Updated 3 months ago
Alternatives and similar repositories for spark-standalone-cluster-on-docker
Users that are interested in spark-standalone-cluster-on-docker are comparing it to the libraries listed below
Sorting:
- A simple spark standalone cluster for your testing environment purposses☆569Mar 6, 2024Updated last year
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆38Jun 9, 2023Updated 2 years ago
- Apache Spark docker image☆2,058Apr 21, 2023Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆262Aug 5, 2023Updated 2 years ago
- spark on kubernetes☆104Feb 20, 2023Updated 2 years ago
- Spark cluster in docker containers with sample training Jupyter notebooks☆27Feb 24, 2023Updated 2 years ago
- Spark development environment for kubernetes, spark-submit and jupyter notebook☆19Nov 30, 2021Updated 4 years ago
- Docker-compose contains the most common big data systems like: Apache Hadoop, Apache Hive, Apache Spark, Jupyter, Flink☆29Oct 9, 2023Updated 2 years ago
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆171Feb 4, 2021Updated 5 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆83Jan 2, 2025Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Mar 29, 2021Updated 4 years ago
- A minimal docker compose setup for experimenting with cloud agnostic Lakehouse Architectures Apache Spark with Hive Metastore + Delta Lak…☆34Apr 17, 2024Updated last year
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆24Apr 2, 2022Updated 3 years ago
- Code to demonstrate data engineering metadata & logging best practices☆20Mar 12, 2024Updated last year
- ☆46Jul 4, 2023Updated 2 years ago
- Spark Standalone & Livy☆11Jul 13, 2021Updated 4 years ago
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆42Jan 19, 2026Updated 3 weeks ago
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Dec 7, 2020Updated 5 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- Apache Hive Metastore as a Standalone server in Docker☆80Aug 22, 2024Updated last year
- Big Data Ecosystem Docker☆426Apr 29, 2023Updated 2 years ago
- This project provides a reverse proxy for Spark UI on Kubernetes☆17Oct 12, 2023Updated 2 years ago
- A data engineering personal project for applying some of my skills☆19Jul 11, 2021Updated 4 years ago
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆120Jul 20, 2021Updated 4 years ago
- Zeppelin docker☆16Nov 16, 2020Updated 5 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆488Oct 15, 2024Updated last year
- Implementing best practices for PySpark ETL jobs and applications.☆2,064Jan 1, 2023Updated 3 years ago
- Spark app to merge different schemas☆23Dec 21, 2020Updated 5 years ago
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring…☆1,218Sep 8, 2025Updated 5 months ago
- Operator for Apache Spark-on-Kubernetes for Stackable Data Platform☆69Feb 6, 2026Updated last week
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆65Sep 23, 2023Updated 2 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Mar 6, 2025Updated 11 months ago
- A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!☆25Nov 8, 2022Updated 3 years ago
- Multi-container environment with Hadoop, Spark and Hive☆232May 5, 2025Updated 9 months ago
- Apache Spark with HDFS cluster within Kubernetes☆11Jul 11, 2023Updated 2 years ago
- A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.☆11Jul 4, 2021Updated 4 years ago
- This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…☆15Nov 15, 2021Updated 4 years ago
- PySpark test helper methods with beautiful error messages☆752Jan 13, 2026Updated last month