pauldeschacht/SparkDataLineageCapture

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pauldeschacht/SparkDataLineageCapture)

pauldeschacht / SparkDataLineageCapture

Capture the logical plan from Spark (SQL)

☆22

Alternatives and similar repositories for SparkDataLineageCapture

Users that are interested in SparkDataLineageCapture are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tusharchou / local-data-platform
View on GitHub
python library for iceberg lake house on your local
☆14Jan 8, 2026Updated 6 months ago
zzzming / java-dag-scheduler
View on GitHub
Java task scheduler to execute threads which dependency is managed by directed acyclic graph
☆26Feb 2, 2017Updated 9 years ago
BauplanLabs / no-jvm-wap-with-iceberg
View on GitHub
A write-audit-publish implementation on a data lake without the JVM
☆45Aug 12, 2024Updated last year
tmscarla / akka-big-data
View on GitHub
Implementation of a Big Data (batch and stream) distributed processing engine in Java using Akka actors.
☆12Feb 20, 2023Updated 3 years ago
thesquelched / spark-lineage
View on GitHub
Spark SQL listener to record lineage information
☆28Jan 24, 2021Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mikebryant / graphite-opentsdb-finder
View on GitHub
An OpenTSDB finder for graphite.
☆17May 19, 2015Updated 11 years ago
avriiil / stream-this-dataset
View on GitHub
Code to convert static datasets into simulated data streams
☆15Apr 6, 2023Updated 3 years ago
Enet4 / byteordered
View on GitHub
A Rust library for reading and writing data with byte order awareness.
☆16Aug 3, 2021Updated 4 years ago
avensolutions / cdc-at-scale-using-spark
View on GitHub
Scalable CDC Pattern Implemented using PySpark
☆18Oct 8, 2025Updated 9 months ago
opendatamesh-initiative / odm-platform
View on GitHub
A platform to manage the data product life cycle
☆22Mar 25, 2026Updated 3 months ago
bbstilson / sbt-codeartifact
View on GitHub
An sbt plugin for publishing packages to AWS CodeArtifact.
☆26May 29, 2024Updated 2 years ago
hearit-io / redis-channels
View on GitHub
Fast, reliable, and scalable channels implementation based on Redis streams.
☆11Jun 25, 2024Updated 2 years ago
AntonotnaWang / imagenet_and_pytorch_pretrained_model_id_mapping
View on GitHub
Showing the relationship between ImageNet ID and labels and pytorch pre-trained model output ID and labels
☆10Oct 11, 2020Updated 5 years ago
spacejam / tx
View on GitHub
software transactional memory in rust
☆14Jul 20, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jonhoo / strawpoll
View on GitHub
☆19Jul 5, 2026Updated 2 weeks ago
ryoppippi / sveltekit-daisyui-template
View on GitHub
Sveltekit + Tailwind + DaisyUI
☆13Feb 17, 2023Updated 3 years ago
yurisasuke / go-taskq
View on GitHub
A simple golang job queue
☆13Jan 19, 2023Updated 3 years ago
RoundYuanYuan / spark-field-lineage
View on GitHub
spark 字段血缘 spark field lineage
☆32Jun 7, 2022Updated 4 years ago
firezone / sans-io-blog-example
View on GitHub
Code snippets for accompanying the sans-IO blog post.
☆19Aug 24, 2024Updated last year
dianping / hiveweb
View on GitHub
☆15Aug 25, 2014Updated 11 years ago
SjorsWijsman / sveltekit-supabase
View on GitHub
Example template for SvelteKit & Supabase
☆15Mar 24, 2022Updated 4 years ago
y2k4life / ray-tracer-challenge-rust
View on GitHub
The Ray Tracer Challenge by Jamis Buck written in Rust. Broken down chapter by chapter.
☆11Feb 27, 2026Updated 4 months ago
ypt / experiment-flink-cdc-connectors
View on GitHub
An exploration of Flink and change-data-capture via flink-cdc-connectors
☆11Jul 7, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
iMerica / dj-models
View on GitHub
Use the Django ORM in any Python web framework
☆10Feb 16, 2019Updated 7 years ago
abhirockzz / practical-redis
View on GitHub
Code for the book - Practical Redis
☆17Jan 5, 2019Updated 7 years ago
balamaci / muninn
View on GitHub
Java Alerting Framework for ElasticSearch
☆12May 20, 2016Updated 10 years ago
Karasiq / proxyutils
View on GitHub
Scala HTTP/SOCKS proxy library, based on akka-streams
☆10Nov 3, 2018Updated 7 years ago
dstreev / hdp-data-gen
View on GitHub
Hortonworks Data Platform Data Generation Tool
☆13Nov 30, 2017Updated 8 years ago
yaooqinn / multi-tenancy-spark
View on GitHub
A Fully HiveServer2-like Multi-tenancy Spark Thrift Server Supporting Impersonation and Multi-SparkContext with Ranger Authorization (GO …
☆10Jul 7, 2022Updated 4 years ago
duckdb-wasm-examples / observableplot-svelte-typescript
View on GitHub
Using DuckDB-Wasm to query a parquet file and plot the results using Observable Plot.
☆18Dec 31, 2022Updated 3 years ago
631086083 / tairClient
View on GitHub
☆10Aug 13, 2021Updated 4 years ago
bluecolor / octopus
View on GitHub
Open source task scheduler with dependency management
☆15Jul 1, 2018Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
wooplevip / sedis
View on GitHub
SQL for Redis
☆11Sep 16, 2022Updated 3 years ago
hbutani / icebergSQL
View on GitHub
Integration of Iceberg table management into Spark SQL
☆11Jan 21, 2020Updated 6 years ago
munckymagik / go-concurrency-patterns-in-rust
View on GitHub
Rob Pike's examples from the "Go Concurrency Patterns" talk, but in Rust
☆13Jul 9, 2022Updated 4 years ago
irvanariyanto / DistributedSystems
View on GitHub
☆27Oct 20, 2016Updated 9 years ago
t3dotgg / railway-bn
View on GitHub
☆12Jul 11, 2022Updated 4 years ago
xavierguihot / spark_helper
View on GitHub
A bunch of low-level basic methods for data processing and monitoring with Scala Spark
☆10Jun 29, 2018Updated 8 years ago
MartijnVisser / flink-only-sql
View on GitHub
Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …
☆12Updated this week