Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.
☆508Nov 7, 2025Updated 6 months ago
Alternatives and similar repositories for spark-standalone-cluster-on-docker
Users that are interested in spark-standalone-cluster-on-docker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A simple spark standalone cluster for your testing environment purposses☆567Mar 6, 2024Updated 2 years ago
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆37Jun 9, 2023Updated 2 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆134Nov 4, 2022Updated 3 years ago
- Apache Spark docker image☆2,049Apr 20, 2026Updated last month
- Spark cluster in docker containers with sample training Jupyter notebooks☆26Feb 24, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- spark on kubernetes☆104Feb 20, 2023Updated 3 years ago
- Docker with Airflow and Spark standalone cluster☆264Aug 5, 2023Updated 2 years ago
- Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.☆41Jan 5, 2023Updated 3 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Mar 29, 2021Updated 5 years ago
- ☆47Jul 4, 2023Updated 2 years ago
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆169Feb 4, 2021Updated 5 years ago
- Spark Standalone & Livy☆11Jul 13, 2021Updated 4 years ago
- A Procedure To Create A Yarn Cluster Based on Docker, Run Spark, And Do TPC-DS Performance Test.☆16Jan 3, 2024Updated 2 years ago
- Code to demonstrate data engineering metadata & logging best practices☆21Mar 12, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆24Apr 2, 2022Updated 4 years ago
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆42Jan 19, 2026Updated 4 months ago
- Python tool for profiling-based anomaly monitoring on ETL data pipelines leveraging ML and Apache Spark.☆16Mar 5, 2024Updated 2 years ago
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆120Jul 20, 2021Updated 4 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- Apache Hive Metastore as a Standalone server in Docker☆80Aug 22, 2024Updated last year
- A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.☆11Jul 4, 2021Updated 4 years ago
- Spark app to merge different schemas☆23Dec 21, 2020Updated 5 years ago
- This project provides a reverse proxy for Spark UI on Kubernetes☆16Oct 12, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆50Dec 7, 2020Updated 5 years ago
- Image building contents for running Spark standalone on Kubernetes☆16Apr 10, 2020Updated 6 years ago
- ☆24Aug 8, 2021Updated 4 years ago
- Stream Data from Databricks Directly to PowerBI, and CosmosDB!☆12Sep 25, 2018Updated 7 years ago
- ☆94Feb 4, 2025Updated last year
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring…☆1,246Sep 8, 2025Updated 8 months ago
- Notas das aulas da Aceleração Dev #4 da DIO sobre Engenharia de Dados, ministrado pela Everis.☆13Feb 6, 2021Updated 5 years ago
- Implementing best practices for PySpark ETL jobs and applications.☆2,102Jan 1, 2023Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆346May 31, 2024Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆23May 11, 2024Updated 2 years ago
- A Kubernetes operator to enable GitOps style deploys for Databricks resources☆16Jun 3, 2025Updated 11 months ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆494Oct 15, 2024Updated last year
- The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆80Feb 27, 2023Updated 3 years ago
- PySpark test helper methods with beautiful error messages☆769May 20, 2026Updated last week
- Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.☆3,125May 20, 2026Updated last week