harrydevforlife / building-lakehouseLinks
Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
☆35Updated last month
Alternatives and similar repositories for building-lakehouse
Users that are interested in building-lakehouse are comparing it to the libraries listed below
Sorting:
- Code snippets for Data Engineering Design Patterns book☆324Updated last month
- build dw with dbt☆50Updated last year
- Open source stack lakehouse☆25Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Updated 2 years ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆44Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆47Updated last year
- Cost Efficient Data Pipelines with DuckDB☆61Updated 8 months ago
- Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team …☆131Updated this week
- Building a Data Pipeline with an Open Source Stack☆55Updated 7 months ago
- Quick Guides from Dremio on Several topics☆81Updated 2 months ago
- A write-audit-publish implementation on a data lake without the JVM☆45Updated last year
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆81Updated last week
- Delta Lake Documentation☆53Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, duckdb and Superset☆46Updated last month
- New generation opensource data stack☆76Updated 3 years ago
- A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive M…☆47Updated last year
- Demo DAGs that show how to run dbt Core in Airflow using Cosmos☆67Updated 8 months ago
- A custom end-to-end analytics platform for customer churn☆11Updated 8 months ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆279Updated 3 months ago
- The Open-Source Enterprise Data Platform in a single Portal☆264Updated last week
- Delta Lake examples☆237Updated last year
- This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source to…☆37Updated 2 years ago
- New Generation Opensource Data Stack Demo☆454Updated 2 years ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆29Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆258Updated last month
- Delta-Lake, ETL, Spark, Airflow☆48Updated 3 years ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆31Updated last year
- Code for dbt tutorial☆166Updated 4 months ago
- Local Environment to Practice Data Engineering☆143Updated last year
- Code for "Efficient Data Processing in Spark" Course☆356Updated 3 months ago