manuzhang / awesome-lakehouse
a curated list of awesome lakehouse frameworks, applications, etc
☆27Updated 2 months ago
Alternatives and similar repositories for awesome-lakehouse:
Users that are interested in awesome-lakehouse are comparing it to the libraries listed below
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆77Updated 3 weeks ago
- Monitoring and insights on your data lakehouse tables☆28Updated this week
- ☆29Updated 5 months ago
- A Table format agnostic data sharing framework☆38Updated last year
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated this week
- Kafka Connector for Iceberg tables☆16Updated last year
- Unity Catalog UI☆40Updated 8 months ago
- Library to convert DBT manifest metadata to Airflow tasks☆48Updated last year
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆67Updated this week
- Adapter for dbt that executes dbt pipelines on Apache Flink☆95Updated last year
- A dbt adapter for Decodable☆12Updated 2 months ago
- Open, Multi-modal Catalog for Data & AI, written in Rust☆79Updated 7 months ago
- Multi-hop declarative data pipelines☆115Updated this week
- Yet Another (Spark) ETL Framework☆21Updated last year
- Utility functions for dbt projects running on Trino☆21Updated last year
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)☆233Updated last month
- Apache Hive Metastore as a Standalone server in Docker☆74Updated 8 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆187Updated this week
- Delta Lake helper methods. No Spark dependency.☆23Updated 7 months ago
- A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.☆40Updated 8 months ago
- ☆80Updated 2 weeks ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- ☆17Updated 10 months ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 8 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- dbt-starrocks contains all of the code enabling dbt to work with StarRocks☆33Updated 2 weeks ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆26Updated 8 months ago
- QTag: Turbocharge Your SQL Comments☆13Updated 3 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated 2 weeks ago