cerndb / SparkPluginsLinks
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆89Updated 3 weeks ago
Alternatives and similar repositories for SparkPlugins
Users that are interested in SparkPlugins are comparing it to the libraries listed below
Sorting:
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 5 months ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆185Updated 2 years ago
- The Internals of Delta Lake☆184Updated 4 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆125Updated last week
- Flowchart for debugging Spark applications☆105Updated 8 months ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- Custom state store providers for Apache Spark☆92Updated 3 months ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- ☆105Updated last year
- All the things about TPC-DS in Apache Spark☆106Updated last year
- A library that provides useful extensions to Apache Spark and PySpark.☆224Updated 2 months ago
- Visualize column-level data lineage in Spark SQL☆91Updated 3 years ago
- The Internals of Spark on Kubernetes☆71Updated 3 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Updated 2 years ago
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- An Extensible Data Skipping Framework☆47Updated 4 months ago
- Developing Spark External Data Sources using the V2 API☆48Updated 7 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆127Updated 6 years ago
- Spline agent for Apache Spark☆193Updated 2 weeks ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 3 weeks ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆299Updated last year
- Java bindings for https://github.com/facebookincubator/velox☆27Updated this week
- ☆63Updated 5 years ago
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated last year
- Spark connector for SFTP☆100Updated 2 years ago