fraibacas / lakehouse-pocLinks
Run an open-source data LakeHouse locally using Docker Compose
☆11Updated last year
Alternatives and similar repositories for lakehouse-poc
Users that are interested in lakehouse-poc are comparing it to the libraries listed below
Sorting:
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated last year
- ☆17Updated 10 months ago
- Demonstrating the capabilities of DuckDB as a transformation engine for data lakes☆28Updated 8 months ago
- Cost Efficient Data Pipelines with DuckDB☆54Updated last month
- Repo for CDC with debezium blog post☆28Updated 9 months ago
- Discover the simplicity and strength of Duckdb, dbt, and Iceberg in this project. Create an efficient, versatile data analytics solution …☆34Updated last year
- ☆16Updated last year
- duckdb-etl-framework☆12Updated 6 months ago
- Code for data quality with greatexpectations blog☆12Updated 11 months ago
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data qualit…☆59Updated last week
- Knowledge sharing - Cheat sheets☆11Updated 2 weeks ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆72Updated last year
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆22Updated 2 years ago
- Building a poor man's data lake: Exploring the Power of Polars and Delta Lake☆10Updated last month
- ☆10Updated 3 years ago
- SQLMesh example projects☆30Updated 7 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated last week
- ☆34Updated last month
- Examples for High Performance Spark☆16Updated 7 months ago
- Unity Catalog UI☆40Updated 9 months ago
- ☆18Updated last year
- DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from da…☆46Updated last month
- Sample code to collect Apache Iceberg metrics for table monitoring☆28Updated 10 months ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 10 months ago
- learning-by-doing data model built with dbt-core☆13Updated 6 months ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆31Updated last year
- dlt-dagster-demo☆11Updated last year
- Yet Another (Spark) ETL Framework☆21Updated last year
- A flake8 plugin that detects of usage withColumn in a loop or inside reduce☆28Updated last week