leehuwuj / olh
Open source stack lakehouse
☆25Updated 6 months ago
Related projects: ⓘ
- A custom end-to-end data pipeline for customer churn☆9Updated this week
- Delta Lake Documentation☆45Updated 3 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆51Updated last year
- A Python package to submit and manage Apache Spark applications on Kubernetes.☆33Updated last month
- Simple stream processing pipeline☆89Updated 3 months ago
- A Table format agnostic data sharing framework☆36Updated 7 months ago
- Yet Another (Spark) ETL Framework☆18Updated 10 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆64Updated 3 years ago
- ☆22Updated last year
- Code snippets for Data Engineering Design Patterns book☆27Updated this week
- A Python Library to support running data quality rules while the spark job is running⚡☆161Updated last month
- Apache Hive Metastore as a Standalone server in Docker☆64Updated 3 weeks ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆185Updated this week
- Delta Lake helper methods. No Spark dependency.☆21Updated last week
- Delta Lake examples☆201Updated 3 months ago
- The Complete Big Data Installation Solutions☆14Updated last year
- ☆232Updated this week
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆15Updated 2 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆37Updated 9 months ago
- ☆12Updated last year
- Quick Guides from Dremio on Several topics☆60Updated 2 weeks ago
- Delta lake and filesystem helper methods☆48Updated 6 months ago
- Delta-Lake, ETL, Spark, Airflow☆42Updated last year
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆42Updated last week
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows☆41Updated 2 months ago
- Spark data pipeline that processes movie ratings data.☆26Updated last month
- ☆197Updated last month
- Step-by-step tutorial on building a Kimball dimensional model with dbt☆100Updated 2 months ago
- Example of how to leverage Apache Spark distributed capabilities to call REST-API using a UDF☆47Updated last year
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆40Updated 11 months ago