Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆95May 9, 2025Updated 11 months ago
Alternatives and similar repositories for SparkPlugins
Users that are interested in SparkPlugins are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Jun 29, 2021Updated 4 years ago
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…☆820Apr 1, 2026Updated last week
- A library that brings useful functions from various modern database management systems to Apache Spark☆62Sep 4, 2023Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆235Mar 18, 2026Updated 3 weeks ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Spark metrics related custom classes and sinks (e.g. Prometheus)☆188Aug 2, 2022Updated 3 years ago
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 5 months ago
- Qubole Sparklens tool for performance tuning Apache Spark☆589Jun 26, 2024Updated last year
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Apr 15, 2025Updated 11 months ago
- Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…☆462Dec 15, 2025Updated 3 months ago
- Rocksdb state storage implementation for Structured Streaming.☆17Oct 21, 2020Updated 5 years ago
- Spark SQL index for Parquet tables☆135May 6, 2021Updated 4 years ago
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆973Updated this week
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆431Jan 14, 2022Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The Internals of Delta Lake☆188Nov 30, 2025Updated 4 months ago
- REST job server for Apache Spark☆44May 23, 2025Updated 10 months ago
- A tool to validate data, built around Apache Spark.☆101Feb 19, 2026Updated last month
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 7 months ago
- DataTunnel 是一个基于spark引擎的超高性能的分布式数据集成软件,支持海量数据的同步。基于spark extensions 扩展的DSL语法,结合的Spark SQL,更加便捷融入数仓 ETLT 过程中,简单易用。☆36Updated this week
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆27Mar 17, 2026Updated 3 weeks ago
- Expressive types for Spark.☆896Apr 7, 2026Updated last week
- A Spark plugin for CPU and memory profiling☆21Mar 17, 2026Updated 3 weeks ago
- Sample processing code using Spark 2.1+ and Scala☆51Jun 28, 2020Updated 5 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A Python Library to support running data quality rules while the spark job is running⚡☆201Updated this week
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Feb 21, 2023Updated 3 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆96Jul 7, 2021Updated 4 years ago
- type-class based data cleansing library for Apache Spark SQL☆78Jun 23, 2019Updated 6 years ago
- Mirror of Apache DataFu☆122May 20, 2025Updated 10 months ago
- Using log4j insert log info into ElasticSearch☆26Oct 31, 2016Updated 9 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆455Apr 2, 2026Updated last week
- The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…☆1,743Updated this week
- ☆63Nov 8, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Jul 31, 2023Updated 2 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Go Client for Hive Metastore☆14Dec 18, 2022Updated 3 years ago
- Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.☆301Jul 13, 2025Updated 9 months ago
- Spark Streaming Checkpoint File Manager for MinIO☆11Apr 25, 2023Updated 2 years ago
- A Kubernetes Scheduler Extender to provide gang scheduling support for Spark on Kubernetes☆176Apr 23, 2023Updated 2 years ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆306Oct 30, 2025Updated 5 months ago