Official Dockerfile for Apache Spark
☆166Feb 18, 2026Updated last month
Alternatives and similar repositories for spark-docker
Users that are interested in spark-docker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Apache Spark Kubernetes Operator☆268Updated this week
- A set of transformations for Kafka Connect☆23Mar 1, 2026Updated 3 weeks ago
- A Spark data source for reading Microsoft Excel files☆13Jul 1, 2024Updated last year
- Spark-Dashboard is an open-source monitoring solution for Apache Spark that provides real-time performance dashboards using containers an…☆134Updated this week
- A tool to get better debug info on spark's memory usage☆42Aug 21, 2019Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆42Jan 19, 2026Updated 2 months ago
- Python API for Deequ☆812Mar 9, 2026Updated 2 weeks ago
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,039Mar 19, 2026Updated last week
- ☆12Feb 18, 2026Updated last month
- The official Model Context Protocol (MCP) server for DataHub (https://datahub.com)☆69Mar 17, 2026Updated last week
- A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).☆18Apr 20, 2024Updated last year
- Enables automatic refactoring and linting of Maven projects written in Scala using Scalafix.☆26Mar 14, 2026Updated last week
- Apache DataFusion Comet Spark Accelerator☆1,154Updated this week
- Apache flink☆21Jan 26, 2026Updated last month
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- This repository contains code samples shared on https://dev.java/ and https://inside.java/☆14Jun 16, 2024Updated last year
- Apache Spark Website☆134Mar 12, 2026Updated last week
- Maven packaging and lifecycle for Trino plugins☆15Jan 26, 2026Updated last month
- A simple spark standalone cluster for your testing environment purposses☆566Mar 6, 2024Updated 2 years ago
- A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer☆52Oct 30, 2023Updated 2 years ago
- Official Java implementation of Apache Arrow☆83Mar 16, 2026Updated last week
- ☆20Nov 17, 2025Updated 4 months ago
- Kubernetes Helm Chart to deploy Apache Atlas☆16Oct 19, 2020Updated 5 years ago
- Official Dockerfile for Delta Lake☆61Feb 24, 2026Updated last month
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A library that provides useful extensions to Apache Spark and PySpark.☆235Mar 18, 2026Updated last week
- Open source LLM trace tool for Langchain built on MongoDB☆13Updated this week
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆10Feb 2, 2024Updated 2 years ago
- A tool for translating Scala source code into readable and maintainable Java code☆13Jan 3, 2026Updated 2 months ago
- How to setup a minimal Hadoop cluster using Docker☆11Mar 13, 2022Updated 4 years ago
- Example of how to build machine learning training workflow on AWS by Prefect☆12Nov 2, 2022Updated 3 years ago
- Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.☆2,311Updated this week
- Adapter for dbt that executes dbt pipelines on Apache Flink☆97Mar 19, 2024Updated 2 years ago
- Docker image for Spark history server on Kubernetes☆15Mar 13, 2020Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…☆818Mar 4, 2026Updated 3 weeks ago
- simple inverted index full text search engine written in python☆13Oct 3, 2013Updated 12 years ago
- Data Observability for Data Engineering, published by Packt Publishing☆11Jan 24, 2025Updated last year
- Docker image that builds a patched Apache Spark with AWS Glue support as metastore☆17Jun 8, 2024Updated last year
- Provides a wrapper for the UNIX-style Database Manager Library☆18Dec 26, 2025Updated 2 months ago
- 맛집 리뷰를 위한 repository :+1:☆29Jun 21, 2024Updated last year
- All the things about TPC-DS in Apache Spark☆109Jun 15, 2023Updated 2 years ago