cerndb / SparkPluginsLinks
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆89Updated 2 months ago
Alternatives and similar repositories for SparkPlugins
Users that are interested in SparkPlugins are comparing it to the libraries listed below
Sorting:
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Updated 2 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 6 months ago
- The Internals of Delta Lake☆184Updated 6 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆125Updated last month
- A library that provides useful extensions to Apache Spark and PySpark.☆227Updated this week
- Custom state store providers for Apache Spark☆92Updated 5 months ago
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- Flowchart for debugging Spark applications☆105Updated 9 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 4 years ago
- ☆106Updated 2 years ago
- All the things about TPC-DS in Apache Spark☆106Updated 2 years ago
- An Extensible Data Skipping Framework☆47Updated last week
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Updated last year
- Avro SerDe for Apache Spark structured APIs.☆235Updated last month
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 2 weeks ago
- Visualize column-level data lineage in Spark SQL☆92Updated 3 years ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆127Updated 6 years ago
- ☆63Updated 5 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆124Updated this week
- The Internals of Spark on Kubernetes☆71Updated 3 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Updated 4 years ago
- Spark metrics related custom classes and sinks (e.g. Prometheus)☆183Updated 2 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆344Updated last year
- Spark Structured Streaming State Tools☆34Updated 5 years ago
- Spline agent for Apache Spark☆195Updated last week
- Snowflake Data Source for Apache Spark.☆226Updated 3 weeks ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 6 years ago
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated last year