Official Dockerfile for Apache Spark
☆166Feb 18, 2026Updated 2 weeks ago
Alternatives and similar repositories for spark-docker
Users that are interested in spark-docker are comparing it to the libraries listed below
Sorting:
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆42Jan 19, 2026Updated last month
- ☆12Feb 18, 2026Updated 2 weeks ago
- ☆20Nov 17, 2025Updated 3 months ago
- Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.☆3,106Updated this week
- ☆18Nov 4, 2024Updated last year
- Example of how to build machine learning training workflow on AWS by Prefect☆12Nov 2, 2022Updated 3 years ago
- A Spark data source for reading Microsoft Excel files☆13Jul 1, 2024Updated last year
- Helm charts for Trino and Trino Gateway☆193Feb 23, 2026Updated last week
- Apache Spark Connect Client for Golang☆247Oct 13, 2025Updated 4 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆134Jan 5, 2026Updated 2 months ago
- Provides a wrapper for the UNIX-style Database Manager Library☆18Dec 26, 2025Updated 2 months ago
- DuckDB with Dashboarding tools demo evidence, streamlit and rill☆21Dec 18, 2023Updated 2 years ago
- Python API for Deequ☆814Jan 21, 2026Updated last month
- Kubernetes Helm Chart to deploy Apache Atlas☆16Oct 19, 2020Updated 5 years ago
- A Jupyter Server Extension Providing Support for Terminals☆20Jan 14, 2026Updated last month
- The official Model Context Protocol (MCP) server for DataHub (https://datahub.com)☆68Feb 25, 2026Updated last week
- A tool to get better debug info on spark's memory usage☆42Aug 21, 2019Updated 6 years ago
- Apache Spark Website☆134Feb 25, 2026Updated last week
- Collection of NiFi-related stuff☆24Oct 27, 2022Updated 3 years ago
- ☆243Updated this week
- A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer☆52Oct 30, 2023Updated 2 years ago
- trino + hive + minio with postgres in docker compose☆28Aug 18, 2023Updated 2 years ago
- Official Dockerfile for Delta Lake☆60Feb 24, 2026Updated last week
- Adapter for dbt that executes dbt pipelines on Apache Flink☆96Mar 19, 2024Updated last year
- Apache DataFusion Comet Spark Accelerator☆1,148Updated this week
- Spark integrations for working with Lance datasets☆45Updated this week
- A library that provides useful extensions to Apache Spark and PySpark.☆232Jan 20, 2026Updated last month
- Linter config initializer for Python☆27Sep 25, 2023Updated 2 years ago
- Operator for Apache Spark-on-Kubernetes for Stackable Data Platform☆69Feb 26, 2026Updated last week
- Official Java implementation of Apache Arrow☆82Updated this week
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆203Feb 26, 2026Updated last week
- ☆26Dec 18, 2020Updated 5 years ago
- BayesML: your first library for Bayesian machine learning☆16Jan 27, 2026Updated last month
- ☆16Nov 4, 2024Updated last year
- Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.☆946Feb 26, 2026Updated last week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆8,608Updated this week
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆816Updated this week
- Dockerized monitoring stack for Apache Airflow☆36Sep 8, 2024Updated last year
- Fast, pure-Rust reader and writer for Well-Known Binary geometries☆36Nov 21, 2025Updated 3 months ago