cerndb / SparkPluginsLinks
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆94Updated 7 months ago
Alternatives and similar repositories for SparkPlugins
Users that are interested in SparkPlugins are comparing it to the libraries listed below
Sorting:
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆184Updated 2 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆130Updated 2 weeks ago
- The Internals of Delta Lake☆187Updated 3 weeks ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆61Updated 2 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 4 years ago
- All the things about TPC-DS in Apache Spark☆108Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆231Updated last week
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- Custom state store providers for Apache Spark☆92Updated 10 months ago
- A tool to get better debug info on spark's memory usage☆42Updated 6 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 11 months ago
- Flowchart for debugging Spark applications☆107Updated last year
- ☆107Updated 2 years ago
- The Internals of Spark on Kubernetes☆72Updated 3 years ago
- Visualize column-level data lineage in Spark SQL☆92Updated 3 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated 2 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆123Updated 3 weeks ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Updated last year
- Spline agent for Apache Spark☆200Updated 2 weeks ago
- Spark Structured Streaming State Tools☆34Updated 5 years ago
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆50Updated 3 months ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆302Updated last month
- An Extensible Data Skipping Framework☆47Updated 5 months ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 2 months ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆128Updated 7 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆346Updated last year
- Avro SerDe for Apache Spark structured APIs.☆238Updated 6 months ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆430Updated 3 years ago
- A tool to validate data, built around Apache Spark.☆100Updated last week
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆46Updated last week