dataflint/spark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dataflint/spark)

dataflint / spark

Drop-in replacement for Apache Spark UI

☆477

Alternatives and similar repositories for spark

Users that are interested in spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kubeflow / mcp-apache-spark-history-server
View on GitHub
MCP Server and CLI for Apache Spark History Server. Debug Spark applications from AI agents, scripts, or the terminal.
☆183Updated this week
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
apache / spark-kubernetes-operator
View on GitHub
Apache Spark Kubernetes Operator
☆302Updated this week
apache / datafusion-comet
View on GitHub
Apache DataFusion Comet Spark Accelerator
☆1,230Updated this week
cerndb / spark-dashboard
View on GitHub
Spark-Dashboard is an open-source monitoring solution for Apache Spark that provides real-time performance dashboards using containers an…
☆137May 6, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
G-Research / spark-extension
View on GitHub
A library that provides useful extensions to Apache Spark and PySpark.
☆238Jul 1, 2026Updated 2 weeks ago
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,056Updated this week
lakekeeper / lakekeeper
View on GitHub
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
☆1,392Updated this week
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,778Updated this week
SemyonSinchenko / flake8-pyspark-with-column
View on GitHub
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
☆28Jun 20, 2025Updated last year
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
MrPowers / chispa
View on GitHub
PySpark test helper methods with beautiful error messages
☆772Jul 12, 2026Updated last week
datapunchorg / spark-ui-reverse-proxy
View on GitHub
This project provides a reverse proxy for Spark UI on Kubernetes
☆16Oct 12, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
unitycatalog / unitycatalog
View on GitHub
Open, Multi-modal Catalog for Data & AI
☆3,462Updated this week
liaco / mimir
View on GitHub
☆16Jul 25, 2025Updated 11 months ago
Nike-Inc / spark-expectations
View on GitHub
A Python Library to support running data quality rules while the spark job is running⚡
☆201Updated this week
lakehq / sail
View on GitHub
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
☆3,195Updated this week
nimtable / nimtable
View on GitHub
The observability platform for Iceberg lakehouses.
☆468Jan 12, 2026Updated 6 months ago
apache / incubator-xtable
View on GitHub
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processin…
☆1,194Updated this week
awslabs / python-deequ
View on GitHub
Python API for Deequ
☆823Updated this week
apache / uniffle
View on GitHub
Uniffle is a high performance, general purpose Remote Shuffle Service.
☆451Updated this week
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
apache / polaris
View on GitHub
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
☆2,018Updated this week
mrpowers-io / levi
View on GitHub
Delta Lake helper methods. No Spark dependency.
☆22Jan 19, 2026Updated 6 months ago
MartijnVisser / flink-only-sql
View on GitHub
Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …
☆12Updated this week
kubeflow / spark-operator
View on GitHub
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
☆3,140Updated this week
Neutrinic / flare
View on GitHub
Full-stack OpenTelemetry observability for Apache Spark
☆16Feb 28, 2026Updated 4 months ago
linkedin / openhouse
View on GitHub
Open Control Plane for Tables in Data Lakehouse
☆392Updated this week
projectnessie / nessie
View on GitHub
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
☆1,481Updated this week
apache / gravitino
View on GitHub
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
☆3,097Updated this week
aws-samples / emr-remote-shuffle-service
View on GitHub
☆18May 7, 2026Updated 2 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
mrpowers-io / jodie
View on GitHub
Delta lake and filesystem helper methods
☆51Feb 29, 2024Updated 2 years ago
linkedin / Hoptimator
View on GitHub
Multi-hop declarative data pipelines
☆126Updated this week
datamechanics / delight
View on GitHub
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
☆345May 31, 2024Updated 2 years ago
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆662Jul 13, 2026Updated last week
onehouseinc / lake-loader
View on GitHub
A tool to benchmark L (loading) workloads within ETL workloads
☆32Updated this week
microsoft / LakeBench
View on GitHub
A multi-modal Python library for benchmarking lakehouse engines and ELT scenarios, supporting both industry-standard and novel benchmarks…
☆52Updated this week