cluster-apps-on-docker / spark-standalone-cluster-on-docker
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.
☆462Updated last year
Related projects ⓘ
Alternatives and complementary repositories for spark-standalone-cluster-on-docker
- Docker with Airflow and Spark standalone cluster☆245Updated last year
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆123Updated 2 years ago
- Multi-container environment with Hadoop, Spark and Hive☆203Updated 10 months ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆424Updated last month
- A simple spark standalone cluster for your testing environment purposses☆557Updated 8 months ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- PySpark test helper methods with beautiful error messages☆621Updated 3 weeks ago
- Pyspark RDD, DataFrame and Dataset Examples in Python language☆1,175Updated 7 months ago
- Apache Airflow in Docker Compose (for both versions 1.10.* and 2.*)☆184Updated 11 months ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆451Updated last year
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆209Updated last year
- Fundamentals of Spark with Python (using PySpark), code examples☆335Updated 2 years ago
- ☆252Updated 3 weeks ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆643Updated last month
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆23Updated last year
- A simplified, lightweight ETL Framework based on Apache Spark☆584Updated 9 months ago
- This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language☆559Updated 8 months ago
- ☆23Updated 3 years ago
- Spark style guide☆256Updated last month
- Implementing best practices for PySpark ETL jobs and applications.☆1,691Updated last year
- Delta Lake helper methods in PySpark☆304Updated 2 months ago
- Apache Spark 3 - Structured Streaming Course Material☆119Updated last year
- Apache Airflow tutorial☆934Updated 2 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆61Updated 5 months ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆32Updated 4 years ago
- Code for Data Pipelines with Apache Airflow☆719Updated 3 months ago
- Apache Spark 3 - Spark Programming in Python for Beginners☆384Updated 3 months ago