apache / incubator-datalab
Apache DataLab (incubating)
☆153Updated 11 months ago
Related projects: ⓘ
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated 10 months ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆62Updated 4 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆91Updated this week
- A simple Spark-powered ETL framework that just works 🍺☆177Updated 9 months ago
- ☆77Updated last year
- Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)☆179Updated this week
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆296Updated 8 months ago
- Simple project to expose a catalog over REST using a Java catalog backend☆103Updated this week
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆51Updated this week
- Storage connector for Trino☆90Updated 3 weeks ago
- ☆375Updated this week
- The Internals of Delta Lake☆180Updated last month
- Egeria's Guidance on Governance as well as large media files such as presentations and movies☆101Updated last year
- Schema Registry☆15Updated 3 months ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.☆42Updated last week
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆104Updated this week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated last month
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆96Updated last year
- REST API for Apache Spark on K8S or YARN☆89Updated last week
- Spline agent for Apache Spark☆183Updated last week
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆82Updated 5 months ago
- Apache datasketches☆85Updated last year
- DataQuality for BigData☆139Updated 9 months ago
- Build configuration-driven ETL pipelines on Apache Spark☆157Updated last year
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆41Updated last week
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 3 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆193Updated last week
- Snowflake Data Source for Apache Spark.☆213Updated this week
- Multiple node presto cluster on docker container☆120Updated 2 years ago