linkedin / datahub-gma
General Metadata Architecture
☆121Updated this week
Related projects: ⓘ
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful …☆142Updated 2 months ago
- Spline agent for Apache Spark☆183Updated last week
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆296Updated 8 months ago
- DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.☆42Updated last week
- Apache Iceberg Documentation Site☆42Updated 7 months ago
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated 10 months ago
- ☆375Updated this week
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆62Updated 4 months ago
- This Apache Atlas is built from the latest release source tarball and patched to be run in a Docker container.☆139Updated 8 months ago
- A tool to install, configure and manage Trino installations☆26Updated 2 years ago
- A data generator source connector for Flink SQL based on data-faker.☆206Updated last year
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated last month
- Apache DataLab (incubating)☆153Updated 11 months ago
- FeatHub - A stream-batch unified feature store for real-time machine learning☆313Updated 3 months ago
- Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)☆179Updated this week
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆341Updated 3 months ago
- A simple Spark-powered ETL framework that just works 🍺☆177Updated 9 months ago
- A Spark Atlas connector to track data lineage in Apache Atlas☆264Updated last year
- Generate and Visualize Data Lineage from query history☆309Updated last year
- Spark ClickHouse Connector build on DataSourceV2 API☆181Updated this week
- Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.☆780Updated 2 weeks ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆193Updated last week
- Data Lineage Tracking And Visualization Solution☆596Updated last week
- Benchmarks for Apache Flink☆164Updated 2 months ago
- Spark Connector to read and write with Pulsar☆111Updated 5 months ago
- ☆173Updated last year
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆160Updated this week
- Machine learning library of Apache Flink☆299Updated 5 months ago
- Apache Flink Website☆143Updated 2 weeks ago