cerndb / SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆86Updated 9 months ago
Alternatives and similar repositories for SparkPlugins:
Users that are interested in SparkPlugins are comparing it to the libraries listed below
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 3 weeks ago
- Custom state store providers for Apache Spark☆92Updated 2 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Updated last year
- The Internals of Delta Lake☆183Updated 2 weeks ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆118Updated last week
- Spark SQL index for Parquet tables☆134Updated 3 years ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆208Updated 2 months ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- Flowchart for debugging Spark applications☆104Updated 4 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- Visualize column-level data lineage in Spark SQL☆88Updated 2 years ago
- All the things about TPC-DS in Apache Spark☆105Updated last year
- Examples of Spark 3.0☆47Updated 4 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆114Updated this week
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆126Updated 6 years ago
- An Extensible Data Skipping Framework☆43Updated 2 weeks ago
- Spline agent for Apache Spark☆190Updated 3 weeks ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 4 years ago
- ☆104Updated last year
- ☆63Updated 5 years ago
- A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…☆172Updated 2 years ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- ☆102Updated 4 years ago
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆38Updated 8 months ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago