Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs and reports.
☆160Dec 10, 2022Updated 3 years ago
Alternatives and similar repositories for superglue
Users that are interested in superglue are comparing it to the libraries listed below
Sorting:
- Streaming PDF processor for Scala☆13Apr 2, 2025Updated 11 months ago
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.☆268Mar 26, 2025Updated 11 months ago
- Data Lineage Tracking And Visualization Solution☆656Updated this week
- Collaboration app for sharing and reviewing jupyter notebooks☆16May 25, 2025Updated 9 months ago
- The sane way of building a data layer in Airflow☆24Dec 5, 2019Updated 6 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆97Updated this week
- Code review for data in dbt☆494Jan 3, 2025Updated last year
- Simple samples for writing ETL transform scripts in Python☆24Jan 20, 2026Updated last month
- Egeria core☆898Updated this week
- A data access control framework for Open Policy Agent☆37Jun 12, 2024Updated last year
- Make dbt docs and Apache Superset talk to one another☆156Feb 12, 2026Updated 2 weeks ago
- Generic Data Ingestion & Dispersal Library for Hadoop☆482Mar 19, 2023Updated 2 years ago
- Collect, aggregate, and visualize a data ecosystem's metadata☆2,129Feb 20, 2026Updated last week
- An Open Standard for lineage metadata collection☆2,330Updated this week
- Schema modelling framework for decentralised domain-driven ownership of data.☆261Dec 5, 2023Updated 2 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,588Feb 17, 2026Updated 2 weeks ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Feb 8, 2023Updated 3 years ago
- 🐳 The stupidly simple CLI workspace for your data warehouse.☆728Feb 8, 2023Updated 3 years ago
- Generate and Visualize Data Lineage from query history☆327Aug 4, 2023Updated 2 years ago
- Export Airflow metrics (from mysql) in prometheus format☆29Apr 15, 2025Updated 10 months ago
- Dataform is a framework for managing SQL based data operations in BigQuery☆962Updated this week
- Nessie: Transactional Catalog for Data Lakes with Git-like semantics☆1,425Updated this week
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.☆127Aug 3, 2021Updated 4 years ago
- Build your feature store with macros right within your dbt repository☆39Dec 16, 2022Updated 3 years ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,298Updated this week
- 🦘 The Grouparoo Monorepo - open source customer data sync framework☆772Apr 8, 2022Updated 3 years ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆431Jan 14, 2022Updated 4 years ago
- An open protocol for secure data sharing☆920Updated this week
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,286Feb 10, 2025Updated last year
- Airflow support for Marquez☆30Dec 11, 2020Updated 5 years ago
- Stackable Operator for Apache Kafka☆27Updated this week
- re_data - fix data issues before your users & CEO would discover them 😊☆1,569Apr 30, 2024Updated last year
- ☆11Nov 26, 2024Updated last year
- Playground site for creating/validating data contracts☆11Aug 9, 2025Updated 6 months ago
- ☆12Feb 13, 2025Updated last year
- Operator for managing the Spark clusters on Kubernetes and OpenShift.☆159Nov 18, 2021Updated 4 years ago
- Data Pipeline Framework using the singer.io spec☆658Updated this week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…☆4,740Feb 19, 2026Updated last week
- Standalone alternatives to Kafka Connect Connectors☆46Updated this week