cerndb/SparkPlugins

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cerndb/SparkPlugins)

cerndb / SparkPlugins

Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.

☆96

Alternatives and similar repositories for SparkPlugins

Users that are interested in SparkPlugins are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cerndb / spark-dashboard
View on GitHub
Spark-Dashboard is an open-source monitoring solution for Apache Spark that provides real-time performance dashboards using containers an…
☆137May 6, 2026Updated 2 months ago
scravy / pysparkextra
View on GitHub
☆10Jun 29, 2021Updated 5 years ago
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
yaooqinn / itachi
View on GitHub
A library that brings useful functions from various modern database management systems to Apache Spark
☆63Sep 4, 2023Updated 2 years ago
G-Research / spark-extension
View on GitHub
A library that provides useful extensions to Apache Spark and PySpark.
☆239Jul 1, 2026Updated 3 weeks ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
banzaicloud / spark-metrics
View on GitHub
Spark metrics related custom classes and sinks (e.g. Prometheus)
☆186Aug 2, 2022Updated 3 years ago
steveloughran / zero-rename-committer
View on GitHub
Paper: A Zero-rename committer for object stores
☆20Nov 7, 2025Updated 8 months ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
oracle / spark-oracle
View on GitHub
On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.
☆36Apr 15, 2025Updated last year
LucaCanali / Miscellaneous
View on GitHub
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…
☆464May 19, 2026Updated 2 months ago
qubole / spark-state-store
View on GitHub
Rocksdb state storage implementation for Structured Streaming.
☆17Oct 21, 2020Updated 5 years ago
lightcopy / parquet-index
View on GitHub
Spark SQL index for Parquet tables
☆134May 6, 2021Updated 5 years ago
NVIDIA / cudf-spark
View on GitHub
NVIDIA cuDF for Apache Spark plugin - accelerate Apache Spark with GPUs
☆990Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
japila-books / delta-lake-internals
View on GitHub
The Internals of Delta Lake
☆186Jun 18, 2026Updated last month
apache / datafusion-comet
View on GitHub
Apache DataFusion Comet Spark Accelerator
☆1,233Updated this week
target / data-validator
View on GitHub
A tool to validate data, built around Apache Spark.
☆102Jun 15, 2026Updated last month
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
VirtusLab / iskra
View on GitHub
Typesafe wrapper for Apache Spark DataFrame API
☆143Jan 24, 2026Updated 6 months ago
melin / datatunnel
View on GitHub
DataTunnel 是一个基于spark引擎的超高性能的分布式数据集成软件，支持海量数据的同步。基于spark extensions 扩展的DSL语法，结合的Spark SQL，更加便捷融入数仓 ETLT 过程中，简单易用。
☆36Updated this week
amzn / amazon-codeguru-profiler-for-spark
View on GitHub
A Spark plugin for CPU and memory profiling
☆21Mar 17, 2026Updated 4 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
typelevel / frameless
View on GitHub
Expressive types for Spark.
☆898Updated this week
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
Nike-Inc / spark-expectations
View on GitHub
A Python Library to support running data quality rules while the spark job is running⚡
☆201Jul 14, 2026Updated last week
bartosz25 / spark-scala-playground
View on GitHub
Sample processing code using Spark 2.1+ and Scala
☆51Jun 28, 2020Updated 6 years ago
qubole / spark-acid
View on GitHub
ACID Data Source for Apache Spark based on Hive ACID
☆97Jul 7, 2021Updated 5 years ago
apache / datafu
View on GitHub
Mirror of Apache DataFu
☆124Jul 9, 2026Updated 2 weeks ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆457Apr 2, 2026Updated 3 months ago
funkyminds / cleanframes
View on GitHub
type-class based data cleansing library for Apache Spark SQL
☆79Jun 23, 2019Updated 7 years ago
Databeans / lighthouse
View on GitHub
Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…
☆10Jul 31, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,780Updated this week
databricks / spark-sql-perf
View on GitHub
☆623Feb 26, 2022Updated 4 years ago
MemVerge / splash
View on GitHub
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
☆131Dec 19, 2024Updated last year
bluejoe2008 / spark-http-stream
View on GitHub
spark structured streaming via HTTP communication
☆18Jul 7, 2022Updated 4 years ago
akolb1 / gometastore
View on GitHub
Go Client for Hive Metastore
☆14Dec 18, 2022Updated 3 years ago
airbnb / sputnik
View on GitHub
☆64Nov 8, 2019Updated 6 years ago
wankunde / sql-runner
View on GitHub
☆17Mar 19, 2024Updated 2 years ago