cluster-apps-on-docker / spark-standalone-cluster-on-dockerLinks

Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.

☆500

Alternatives and similar repositories for spark-standalone-cluster-on-docker

Users that are interested in spark-standalone-cluster-on-docker are comparing it to the libraries listed below

Sorting:

cordon-thiago / airflow-spark
Docker with Airflow and Spark standalone cluster
☆262Updated 2 years ago
sdesilva26 / docker-spark
Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines
☆134Updated 3 years ago
mvillarrealb / docker-spark-cluster
A simple spark standalone cluster for your testing environment purposses
☆570Updated last year
MrPowers / chispa
PySpark test helper methods with beautiful error messages
☆735Updated this week
dsynkov / spark-livy-on-airflow-workspace
A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.
☆38Updated 4 years ago
josephmachado / spark_submit_airflow
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
☆34Updated 5 years ago
Marcel-Jan / docker-hadoop-spark
Multi-container environment with Hadoop, Spark and Hive
☆226Updated 7 months ago
mrpowers-io / quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
☆676Updated 9 months ago
cartershanklin / pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
☆481Updated last year
BasPH / data-pipelines-with-apache-airflow
Code for Data Pipelines with Apache Airflow
☆809Updated last year
delta-io / delta-examples
Delta Lake examples
☆234Updated last year
awslabs / python-deequ
Python API for Deequ
☆806Updated 8 months ago
bitsondatadev / trino-getting-started
☆269Updated last year
arezamoosavi / AcidOnSpark-ETL
Delta-Lake, ETL, Spark, Airflow
☆48Updated 3 years ago
mrpowers-io / spark-style-guide
Spark style guide
☆266Updated last year
MrPowers / mack
Delta Lake helper methods in PySpark
☆325Updated last year
mahmoudparsian / data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
☆225Updated 2 years ago
spark-examples / spark-scala-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
☆568Updated last year
dbt-labs / dbt-spark
This repository has moved into https://github.com/dbt-labs/dbt-adapters
☆443Updated 4 months ago
mrn-aglic / pyspark-playground
☆92Updated 10 months ago
astronomer / astro-cli
CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer
☆427Updated last week
Armaan1Gohil / dataengineering-tech-stack
Local Environment to Practice Data Engineering
☆143Updated 11 months ago
josephmachado / data_engineering_project_template
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
☆280Updated last year
palantir / pyspark-style-guide
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring…
☆1,199Updated 3 months ago
Nike-Inc / spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
☆193Updated this week
gocardless / airflow-dbt
Apache Airflow integration for dbt
☆411Updated last year
bartosz25 / data-engineering-design-patterns-book
Code snippets for Data Engineering Design Patterns book
☆288Updated 8 months ago
YotpoLtd / metorikku
A simplified, lightweight ETL Framework based on Apache Spark
☆586Updated last year
tuanavu / airflow-tutorial
Apache Airflow tutorial
☆972Updated 3 years ago
andreax79 / airflow-code-editor
A plugin for Apache Airflow that allows you to edit DAGs in browser
☆454Updated 3 weeks ago