Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆94May 9, 2025Updated 9 months ago
Alternatives and similar repositories for SparkPlugins
Users that are interested in SparkPlugins are comparing it to the libraries listed below
Sorting:
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆134Jan 5, 2026Updated last month
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆816Updated this week
- ☆10Jun 29, 2021Updated 4 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆61Sep 4, 2023Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆232Jan 20, 2026Updated last month
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Apr 15, 2025Updated 10 months ago
- Qubole Sparklens tool for performance tuning Apache Spark☆590Jun 26, 2024Updated last year
- Spark SQL index for Parquet tables☆134May 6, 2021Updated 4 years ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆431Jan 14, 2022Updated 4 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 5 months ago
- Sample processing code using Spark 2.1+ and Scala☆51Jun 28, 2020Updated 5 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- The Internals of Delta Lake☆188Nov 30, 2025Updated 3 months ago
- A tool to validate data, built around Apache Spark.☆101Feb 19, 2026Updated last week
- Rocksdb state storage implementation for Structured Streaming.☆17Oct 21, 2020Updated 5 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆96Jul 7, 2021Updated 4 years ago
- Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.☆1,521Updated this week
- A Python Library to support running data quality rules while the spark job is running⚡☆200Updated this week
- Typesafe wrapper for Apache Spark DataFrame API☆144Jan 24, 2026Updated last month
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 3 months ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Feb 21, 2023Updated 3 years ago
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆965Updated this week
- type-class based data cleansing library for Apache Spark SQL☆78Jun 23, 2019Updated 6 years ago
- Apache DataFusion Comet Spark Accelerator☆1,148Updated this week
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 5 months ago
- Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…☆459Dec 15, 2025Updated 2 months ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Dec 19, 2024Updated last year
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Jul 31, 2023Updated 2 years ago
- Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.☆300Jul 13, 2025Updated 7 months ago
- Expressive types for Spark.☆895Updated this week
- ☆63Nov 8, 2019Updated 6 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Spark Streaming Checkpoint File Manager for MinIO☆11Apr 25, 2023Updated 2 years ago
- Data Lineage Tracking And Visualization Solution☆656Updated this week
- A Kubernetes Scheduler Extender to provide gang scheduling support for Spark on Kubernetes☆177Apr 23, 2023Updated 2 years ago
- Flowchart for debugging Spark applications☆106Sep 25, 2024Updated last year
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago