cerndb / SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆88Updated last year
Alternatives and similar repositories for SparkPlugins:
Users that are interested in SparkPlugins are comparing it to the libraries listed below
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆120Updated this week
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆185Updated 2 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 3 months ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- The Internals of Delta Lake☆184Updated 2 months ago
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- Custom state store providers for Apache Spark☆92Updated last month
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- An Extensible Data Skipping Framework☆43Updated 2 months ago
- Flowchart for debugging Spark applications☆105Updated 6 months ago
- ☆63Updated 5 years ago
- All the things about TPC-DS in Apache Spark☆104Updated last year
- ☆105Updated last year
- A library that provides useful extensions to Apache Spark and PySpark.☆221Updated last week
- Spark SQL index for Parquet tables☆134Updated 3 years ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 5 years ago
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 4 years ago
- Visualize column-level data lineage in Spark SQL☆90Updated 2 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Updated 2 years ago
- Spline agent for Apache Spark☆191Updated 2 weeks ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆299Updated last year
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 11 months ago
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆39Updated 10 months ago
- Magic to help Spark pipelines upgrade☆34Updated 6 months ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago