maropu/spark-sql-flow-plugin

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/maropu/spark-sql-flow-plugin)

maropu / spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

☆92

Alternatives and similar repositories for spark-sql-flow-plugin

Users that are interested in spark-sql-flow-plugin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apache / kyuubi-client
View on GitHub
Client libraries of end users of Apache Kyuubi
☆11May 15, 2026Updated 2 months ago
passionke / starry
View on GitHub
fast spark local mode
☆35Aug 20, 2018Updated 7 years ago
lhbench / lhbench
View on GitHub
Lakehouse storage system benchmark
☆82Feb 22, 2023Updated 3 years ago
maropu / datasketches-spark
View on GitHub
Data Sketches for Apache Spark
☆22Dec 22, 2022Updated 3 years ago
linkedin / coral
View on GitHub
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
☆907Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
maropu / spark-data-repair-plugin
View on GitHub
Provide functionality to build statistical models to repair dirty tabular data in Spark
☆12Apr 21, 2023Updated 3 years ago
melin / superior-sql-parser
View on GitHub
基于 antlr4 的多种数据库SQL解析器，获取SQL中元数据，可用于数据平台产品中的多个场景：ddl语句提取元数据、sql 权限校验、表级血缘、sql语法校验等场景。支持spark、flink、gauss、starrocks、Oracle、MYSQL、Postgresq…
☆417Jun 22, 2026Updated 3 weeks ago
wankunde / sql-runner
View on GitHub
☆17Mar 19, 2024Updated 2 years ago
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,056Updated this week
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆662Jul 13, 2026Updated last week
xskipper-io / xskipper
View on GitHub
An Extensible Data Skipping Framework
☆50Jul 15, 2025Updated last year
squito / spark-memory
View on GitHub
A tool to get better debug info on spark's memory usage
☆42Aug 21, 2019Updated 6 years ago
datamechanics / delight
View on GitHub
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
☆345May 31, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hortonworks-spark / spark-atlas-connector
View on GitHub
A Spark Atlas connector to track data lineage in Apache Atlas
☆268Nov 16, 2022Updated 3 years ago
apache / uniffle
View on GitHub
Uniffle is a high performance, general purpose Remote Shuffle Service.
☆451Updated this week
japila-books / pyspark-internals
View on GitHub
The Internals of PySpark
☆28Dec 29, 2024Updated last year
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
techsuppdiva / spark-cheat-sheets
View on GitHub
This repo stores my Spark Tutorial slides.
☆15Feb 8, 2016Updated 10 years ago
CoxAutomotiveDataSolutions / spark-distcp
View on GitHub
A re-implementation of Hadoop DistCP in Apache Spark
☆47Dec 20, 2023Updated 2 years ago
NetEase / spark-alarm
View on GitHub
Alerting and monitoring tool for Apache Spark
☆23May 20, 2022Updated 4 years ago
reata / sqllineage
View on GitHub
SQL Lineage Analysis Tool powered by Python
☆1,673Updated this week
leno1001 / spark_monitor
View on GitHub
请求spark rest API获取applications，jobs，stages，executors，rdds，streaming，environment等信息提供监控和报警服务
☆11Nov 22, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
liancheng / spear
View on GitHub
A playground for experimenting ideas that may apply to Spark SQL/Catalyst
☆143Jul 5, 2018Updated 8 years ago
apache / kyuubi-docker
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆16May 22, 2026Updated last month
qwshen / spark-flight-connector
View on GitHub
A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL
☆49Jun 7, 2026Updated last month
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
OpenLineage / OpenLineage
View on GitHub
An Open Standard for lineage metadata collection
☆2,552Updated this week
tosh2230 / stairlight
View on GitHub
A data lineage tool detects table dependencies from rendered SQL statements.
☆30Mar 14, 2026Updated 4 months ago
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,778Updated this week
thesquelched / spark-lineage
View on GitHub
Spark SQL listener to record lineage information
☆28Jan 24, 2021Updated 5 years ago
maropu / spark-tpcds-datagen
View on GitHub
All the things about TPC-DS in Apache Spark
☆111Jun 15, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ClickHouse / spark-clickhouse-connector
View on GitHub
Spark ClickHouse Connector build on DataSourceV2 API
☆217Updated this week
HeartSaVioR / spark-state-tools
View on GitHub
Spark Structured Streaming State Tools
☆34Jul 3, 2020Updated 6 years ago
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
chermenin / spark-states
View on GitHub
Custom state store providers for Apache Spark
☆92Feb 14, 2025Updated last year
tokern / data-lineage
View on GitHub
Generate and Visualize Data Lineage from query history
☆324Aug 4, 2023Updated 2 years ago
eto-ai / spark-video
View on GitHub
Processing videos on Apache Spark
☆13Feb 14, 2022Updated 4 years ago
akolb1 / gometastore
View on GitHub
Go Client for Hive Metastore
☆14Dec 18, 2022Updated 3 years ago