Apache Spark docker image
☆2,059Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for docker-spark
Users that are interested in docker-spark are comparing it to the libraries listed below
Sorting:
- Apache Hadoop docker image☆2,312Feb 1, 2024Updated 2 years ago
- [EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook a…☆699Oct 1, 2020Updated 5 years ago
- A simple spark standalone cluster for your testing environment purposses☆568Mar 6, 2024Updated 2 years ago
- Apache Flink docker image☆197Jul 1, 2022Updated 3 years ago
- ☆1,080Jun 2, 2024Updated last year
- Docker build for Apache Spark☆671Dec 30, 2021Updated 4 years ago
- ☆252Nov 15, 2022Updated 3 years ago
- ☆32Mar 7, 2018Updated 8 years ago
- Demo Spark application to transform data gathered on sensors for a heatmap application☆33May 29, 2017Updated 8 years ago
- Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.☆3,106Feb 27, 2026Updated last week
- Bootstrap a pipeline on the BDE platform☆27Sep 6, 2016Updated 9 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆509Nov 7, 2025Updated 4 months ago
- Apache Spark - A unified analytics engine for large-scale data processing☆42,933Updated this week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆8,608Updated this week
- Ready-to-run Docker images containing Jupyter applications☆8,419Updated this week
- Docker Apache Airflow☆3,808Mar 1, 2023Updated 3 years ago
- Dockerfile for Apache Kafka☆6,980May 8, 2024Updated last year
- ☆762Mar 11, 2021Updated 4 years ago
- ☆26Nov 22, 2022Updated 3 years ago
- Hadoop docker image☆1,207Jun 25, 2020Updated 5 years ago
- 50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian,…☆1,378Feb 3, 2026Updated last month
- Base classes to use when writing tests with Spark☆1,549Dec 22, 2025Updated 2 months ago
- A curated list of awesome Apache Spark packages and resources.☆1,863Feb 27, 2026Updated last week
- Apache Spark docker container image (Standalone mode)☆35Oct 16, 2020Updated 5 years ago
- Upserts, Deletes And Incremental Processing on Big Data.☆6,103Feb 27, 2026Updated last week
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Sep 9, 2025Updated 5 months ago
- REST job server for Apache Spark☆2,843Jul 8, 2025Updated 7 months ago
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Dec 7, 2020Updated 5 years ago
- spark on kubernetes☆104Feb 20, 2023Updated 3 years ago
- A connector for Spark that allows reading and writing to/from Redis cluster☆948Oct 22, 2024Updated last year
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows☆44,510Updated this week
- The Metadata Platform for your Data and AI Stack☆11,624Updated this week
- Interactive and Reactive Data Science using Scala and Spark.☆3,150May 16, 2023Updated 2 years ago
- Postgresql configured to work as metastore for Hive.☆32Dec 16, 2022Updated 3 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,588Feb 17, 2026Updated 2 weeks ago
- ETL best practices with airflow, with examples☆1,354Sep 25, 2024Updated last year
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,859Jul 10, 2023Updated 2 years ago
- Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.☆172Feb 4, 2021Updated 5 years ago
- Spark Notebook docker image☆10Dec 29, 2017Updated 8 years ago