cerndb / SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
☆85Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for SparkPlugins
- Custom state store providers for Apache Spark☆93Updated 2 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆187Updated last year
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 2 years ago
- The Internals of Delta Lake☆182Updated last month
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated this week
- A library that provides useful extensions to Apache Spark and PySpark.☆196Updated 2 weeks ago
- Flowchart for debugging Spark applications☆101Updated last month
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- ☆63Updated 5 years ago
- All the things about TPC-DS in Apache Spark☆104Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆56Updated last year
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆44Updated 11 months ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 6 months ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Spline agent for Apache Spark☆185Updated 2 weeks ago
- Spark SQL index for Parquet tables☆133Updated 3 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- Visualize column-level data lineage in Spark SQL☆87Updated 2 years ago
- ☆104Updated last year
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- A simple Spark-powered ETL framework that just works 🍺☆178Updated 11 months ago
- Developing Spark External Data Sources using the V2 API☆46Updated 6 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆111Updated this week
- Apache Spark Kubernetes Operator☆65Updated last week
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆125Updated 6 years ago
- Magic to help Spark pipelines upgrade☆34Updated last month