leehuwuj / olhLinks

Open source stack lakehouse

☆25

Alternatives and similar repositories for olh

Users that are interested in olh are comparing it to the libraries listed below

Sorting:

dipankarmazumdar / awesome-lakehouse-guide
Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture
☆124Updated 2 weeks ago
bitsondatadev / trino-getting-started
☆269Updated last year
victorcouste / trino-dbt-demo
Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database
☆77Updated 4 years ago
Nike-Inc / spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
☆193Updated this week
getindata / dbt-flink-adapter
Adapter for dbt that executes dbt pipelines on Apache Flink
☆96Updated last year
DataKitchen / dataops-observability
DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from da…
☆50Updated 3 weeks ago
rajagurunath / lakehouse-sharing
A Table format agnostic data sharing framework
☆42Updated last year
dominikhei / Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…
☆75Updated 2 years ago
BauplanLabs / no-jvm-wap-with-iceberg
A write-audit-publish implementation on a data lake without the JVM
☆45Updated last year
zsvoboda / ngods-stocks
New Generation Opensource Data Stack Demo
☆452Updated 2 years ago
dataflint / spark
Drop-in replacement for Apache Spark UI
☆354Updated last week
ryft-io / iceberg-mcp
☆40Updated 7 months ago
developer-advocacy-dremio / quick-guides-from-dremio
Quick Guides from Dremio on Several topics
☆79Updated last week
1ambda / lakehouse
Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)
☆63Updated 2 years ago
mrpowers-io / levi
Delta Lake helper methods. No Spark dependency.
☆23Updated last year
sibytes / yetl
Yet Another (Spark) ETL Framework
☆21Updated 2 years ago
starlake-ai / starlake
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
☆172Updated this week
delta-io / delta-examples
Delta Lake examples
☆233Updated last year
memiiso / debezium-server-iceberg
Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake
☆290Updated last week
linkedin / openhouse
Open Control Plane for Tables in Data Lakehouse
☆371Updated last week
dremio / dbt-dremio
dbt (data build tool) adapter for the Dremio
☆52Updated last month
naushadh / hive-metastore
Apache Hive Metastore as a Standalone server in Docker
☆80Updated last year
Nike-Inc / brickflow
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
☆222Updated last week
conveyordata / data-product-portal
Data Product Portal created by Dataminded
☆195Updated last week
harrydevforlife / building-lakehouse
Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…
☆33Updated last year
dttung2905 / flink-at-scale
📚 Tech blogs & talks by companies that run Apache Flink in production
☆184Updated 2 weeks ago
gmyrianthous / dbt-airflow
A Python package that creates fine-grained dbt tasks on Apache Airflow
☆77Updated this week
borjavb / dbt-iceberg-poc
☆80Updated last year
hussein-awala / spark-on-k8s
A Python package to submit and manage Apache Spark applications on Kubernetes.
☆44Updated 3 months ago
getindata / dbt-airflow-factory
Library to convert DBT manifest metadata to Airflow tasks
☆49Updated last year