intuit / superglueLinks
Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs and reports.
โ157Updated 2 years ago
Alternatives and similar repositories for superglue
Users that are interested in superglue are comparing it to the libraries listed below
Sorting:
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.htmlโ61Updated 2 years ago
- A simple Spark-powered ETL framework that just works ๐บโ181Updated last month
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data piโฆโ95Updated last week
- A library that provides useful extensions to Apache Spark and PySpark.โ226Updated 3 months ago
- Adapter for dbt that executes dbt pipelines on Apache Flinkโ95Updated last year
- [ARCHIVED] The Presto adapter plugin for dbt Coreโ33Updated last year
- Generate and Visualize Data Lineage from query historyโ326Updated last year
- Snowflake Data Source for Apache Spark.โ226Updated last week
- Multi-hop declarative data pipelinesโ115Updated 2 weeks ago
- DataQuality for BigDataโ144Updated last year
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.โ266Updated 3 months ago
- The Workload Analyzer collects Prestoยฎ and Trino workload statistics, and analyzes themโ135Updated last year
- โ80Updated 2 months ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframesโ64Updated 3 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL databaseโ75Updated 3 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.โ88Updated last year
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.โ344Updated last year
- โ63Updated 5 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesโ124Updated this week
- Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.โ79Updated last week
- Data ingestion library for Amundsen to build graph and search indexโ205Updated last year
- Schema modelling framework for decentralised domain-driven ownership of data.โ254Updated last year
- Extensible streaming ingestion pipeline on top of Apache Sparkโ45Updated last week
- CLI tool to bulk migrate the tables from one catalog another without a data copyโ79Updated 2 months ago
- โ105Updated last year
- Pylint plugin for static code analysis on Airflow codeโ95Updated 4 years ago
- Apache Spark build compatible with AWS Glue Data Catalog.โ19Updated 3 years ago
- Amundsen Gremlinโ21Updated 2 years ago
- ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis. It enables anyone inside an orโฆโ93Updated 2 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)โ61Updated 6 months ago