harrydevforlife / building-lakehouseLinks

Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.

☆31

Alternatives and similar repositories for building-lakehouse

Users that are interested in building-lakehouse are comparing it to the libraries listed below

Sorting:

DataKitchen / data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team …
☆121Updated last week
zsvoboda / ngods
New generation opensource data stack
☆70Updated 3 years ago
hnawaz007 / dbt-dw
build dw with dbt
☆47Updated 8 months ago
bartosz25 / data-engineering-design-patterns-book
Code snippets for Data Engineering Design Patterns book
☆128Updated 3 months ago
dominikhei / Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…
☆73Updated last year
conveyordata / data-product-portal
Data product portal created by Dataminded
☆186Updated this week
aws-samples / monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
☆28Updated 11 months ago
gmyrianthous / dbt-airflow
A Python package that creates fine-grained dbt tasks on Apache Airflow
☆70Updated 9 months ago
dbt-labs / dbt-starter-project
Cloned by the `dbt init` task
☆60Updated last year
Stefen-Taime / Iceberg-Dbt-Trino-Hive-modern-open-source-data-stack
To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…
☆37Updated last year
developer-advocacy-dremio / quick-guides-from-dremio
Quick Guides from Dremio on Several topics
☆72Updated 3 weeks ago
adidas / lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…
☆255Updated 2 weeks ago
pracdata / duckdb-pipeline
Demonstrating the capabilities of DuckDB as a transformation engine for data lakes
☆28Updated 9 months ago
josephmachado / cost_effective_data_pipelines
Cost Efficient Data Pipelines with DuckDB
☆54Updated 2 months ago
jaehyeon-kim / dbt-on-aws
dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats
☆29Updated 2 years ago
mrn-aglic / apache-iceberg-data-exploration
☆18Updated last year
abeltavares / real-time-data-pipeline
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
☆46Updated 5 months ago
arezamoosavi / AcidOnSpark-ETL
Delta-Lake, ETL, Spark, Airflow
☆47Updated 2 years ago
unitycatalog / unitycatalog-ui
Unity Catalog UI
☆41Updated 10 months ago
franloza / coches-net-dashboard
Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market
☆58Updated 2 years ago
airbytehq / open-data-stack
Open Data Stack Projects: Examples of End to End Data Engineering Projects
☆86Updated 2 years ago
ssp-data / awesome-dagster
A curated list of dagster code snippets for data engineers
☆56Updated last year
delta-io / delta-docs
Delta Lake Documentation
☆49Updated last year
dipankarmazumdar / awesome-lakehouse-guide
Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture
☆89Updated 3 weeks ago
delta-io / delta-examples
Delta Lake examples
☆226Updated 9 months ago
alonsomedo / os-data-stack
Building a Data Pipeline with an Open Source Stack
☆55Updated 2 weeks ago
ognis1205 / delta-hub
A platform and cloud-based service for data sharing based on the Delta Sharing protocol.
☆21Updated last year
cnstlungu / portable-data-stack-dagster
A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset
☆234Updated 5 months ago
kaxil / airflowctl
A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects
☆218Updated 2 months ago
josephmachado / simple_dbt_project
Code for dbt tutorial
☆156Updated last month