Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs and reports.
☆161Dec 10, 2022Updated 3 years ago
Alternatives and similar repositories for superglue
Users that are interested in superglue are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Streaming PDF processor for Scala☆13Apr 2, 2025Updated last year
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.☆268Mar 4, 2026Updated last month
- Data Lineage Tracking And Visualization Solution☆657Apr 3, 2026Updated last week
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆98Updated this week
- Generic Data Ingestion & Dispersal Library for Hadoop☆481Mar 19, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Collaboration app for sharing and reviewing jupyter notebooks☆16May 25, 2025Updated 10 months ago
- Make dbt docs and Apache Superset talk to one another☆156Feb 12, 2026Updated 2 months ago
- Collect, aggregate, and visualize a data ecosystem's metadata☆2,160Updated this week
- Code review for data in dbt☆495Jan 3, 2025Updated last year
- Egeria core☆907Apr 1, 2026Updated 2 weeks ago
- The sane way of building a data layer in Airflow☆24Dec 5, 2019Updated 6 years ago
- Build your feature store with macros right within your dbt repository☆39Dec 16, 2022Updated 3 years ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- An Open Standard for lineage metadata collection☆2,396Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 🐳 The stupidly simple CLI workspace for your data warehouse.☆728Feb 8, 2023Updated 3 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,605Apr 1, 2026Updated 2 weeks ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Feb 8, 2023Updated 3 years ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,331Updated this week
- Simple samples for writing ETL transform scripts in Python☆25Jan 20, 2026Updated 2 months ago
- 🦘 The Grouparoo Monorepo - open source customer data sync framework☆772Apr 8, 2022Updated 4 years ago
- Playground site for creating/validating data contracts☆11Aug 9, 2025Updated 8 months ago
- adidas Data Mesh implementation☆12May 13, 2022Updated 3 years ago
- Singer.io tap for generic Rest API☆24Mar 30, 2026Updated 2 weeks ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Export Airflow metrics (from mysql) in prometheus format☆29Apr 15, 2025Updated last year
- Parse dbt artifacts and search dbt models with Algolia☆52May 6, 2021Updated 4 years ago
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,290Feb 10, 2025Updated last year
- Dataform is a framework for managing SQL based data operations in BigQuery☆970Apr 9, 2026Updated last week
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 5 years ago
- Standalone alternatives to Kafka Connect Connectors☆46Mar 10, 2026Updated last month
- Code Repository for GCP: Complete Google Data Engineer and Cloud Architect Guide(v), Published by Packt☆16Jan 30, 2023Updated 3 years ago
- Make Structs Easy (MSE)☆18Jun 22, 2020Updated 5 years ago
- Nessie: Transactional Catalog for Data Lakes with Git-like semantics☆1,450Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆431Jan 14, 2022Updated 4 years ago
- An Apache Mesos Framework that allows for replaying load over and over and over (and over) again☆10Aug 10, 2015Updated 10 years ago
- re_data - fix data issues before your users & CEO would discover them 😊☆1,570Apr 30, 2024Updated last year
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.☆127Aug 3, 2021Updated 4 years ago
- Schema modelling framework for decentralised domain-driven ownership of data.☆261Dec 5, 2023Updated 2 years ago
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform☆259Jul 19, 2023Updated 2 years ago
- SQL Lineage Analysis Tool powered by Python☆1,637Updated this week