dataflint / spark
Performance Observability for Apache Spark
☆239Updated last week
Alternatives and similar repositories for spark:
Users that are interested in spark are comparing it to the libraries listed below
- A Python Library to support running data quality rules while the spark job is running⚡☆180Updated last week
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)☆231Updated this week
- A library that provides useful extensions to Apache Spark and PySpark.☆221Updated last week
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆76Updated last month
- ☆189Updated last week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆120Updated last week
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆345Updated 10 months ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆212Updated this week
- Spark style guide☆258Updated 6 months ago
- A highly efficient daemon for streaming data from Kafka into Delta Lake☆393Updated 3 weeks ago
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake☆238Updated this week
- Helm charts for Trino and Trino Gateway☆161Updated this week
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- Apache Hive Metastore as a Standalone server in Docker☆68Updated 7 months ago
- Open Control Plane for Tables in Data Lakehouse☆333Updated this week
- Schema modelling framework for decentralised domain-driven ownership of data.☆251Updated last year
- ☆261Updated 5 months ago
- Apache Spark Kubernetes Operator☆106Updated this week
- Turning PySpark Into a Universal DataFrame API☆378Updated this week
- A simplified, lightweight ETL Framework based on Apache Spark☆585Updated last year
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆738Updated this week
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆178Updated last week
- Delta Lake helper methods in PySpark☆322Updated 6 months ago
- Custom PySpark Data Sources☆41Updated 2 months ago
- Spline agent for Apache Spark☆191Updated last week
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆240Updated last month
- ☆79Updated last year
- Official Dockerfile for Apache Spark☆128Updated last month
- A simple Spark-powered ETL framework that just works 🍺☆181Updated last month
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆37Updated 3 weeks ago