SemyonSinchenko / spark-connect-example
An example of SparkConnect extension.
☆11Updated 10 months ago
Alternatives and similar repositories for spark-connect-example:
Users that are interested in spark-connect-example are comparing it to the libraries listed below
- A Python Library to support running data quality rules while the spark job is running⚡☆168Updated last week
- Custom PySpark Data Sources☆37Updated last week
- Delta lake and filesystem helper methods☆50Updated 11 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆208Updated 2 months ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆192Updated this week
- Delta Lake helper methods. No Spark dependency.☆22Updated 4 months ago
- Code snippets used in demos recorded for the blog.☆29Updated 2 weeks ago
- Snowflake Data Source for Apache Spark.☆222Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆86Updated 9 months ago
- Delta Lake Website☆24Updated last week
- A simple Spark-powered ETL framework that just works 🍺☆178Updated last year
- Spark style guide☆257Updated 4 months ago
- Write property based tests easily on spark dataframes☆19Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Updated last year
- The Internals of Delta Lake☆183Updated 2 weeks ago
- Performance Observability for Apache Spark☆216Updated this week
- Flowchart for debugging Spark applications☆104Updated 4 months ago
- Delta Lake helper methods in PySpark☆315Updated 4 months ago
- Don't Panic. This guide will help you when it feels like the end of the world.☆23Updated 7 months ago
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows☆41Updated 6 months ago
- Implements a gateway that speaks the SparkConnect protocol and drives a backend using Substrait (over ADBC Flight SQL).☆16Updated 3 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆118Updated last week
- Delta Lake examples☆214Updated 3 months ago
- Task Metrics Explorer☆13Updated 5 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated last month
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 9 months ago
- A Python package to submit and manage Apache Spark applications on Kubernetes.☆41Updated last week
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆345Updated 7 months ago
- A tool to validate data, built around Apache Spark.☆101Updated 2 weeks ago
- Delta Acceptance Testing☆20Updated 6 months ago