fraibacas / lakehouse-poc
Run an open-source data LakeHouse locally using Docker Compose
☆11Updated 10 months ago
Alternatives and similar repositories for lakehouse-poc:
Users that are interested in lakehouse-poc are comparing it to the libraries listed below
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆63Updated last year
- Full stack data engineering tools and infrastructure set-up☆50Updated 4 years ago
- Discover the simplicity and strength of Duckdb, dbt, and Iceberg in this project. Create an efficient, versatile data analytics solution …☆34Updated last year
- ☆17Updated 7 months ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆21Updated 2 years ago
- Repo for CDC with debezium blog post☆28Updated 6 months ago
- Automate and streamline the alerting & notification process for dbt test results🐞🚀☆17Updated last month
- ☆16Updated last year
- Cost Efficient Data Pipelines with DuckDB☆51Updated 8 months ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆134Updated 2 months ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆24Updated 11 months ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 10 months ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 7 months ago
- ☆16Updated last year
- Unity Catalog UI☆40Updated 6 months ago
- A simple Data Engineering solution for testing or education purposes. You only need to know SQL and Python to understand this project. Da…☆25Updated 2 years ago
- ☆33Updated 3 weeks ago
- Personal project for setting up an open source data warehouse.☆29Updated 2 months ago
- Demonstrating the capabilities of DuckDB as a transformation engine for data lakes☆23Updated 5 months ago
- Query Iceberg in Trino, Nessie as Catalog, and use minio to replace AWS S3☆18Updated 10 months ago
- A kind data platform on your local machine. 🤗☆10Updated last week
- A portable Datamart and Business Intelligence suite built with Docker, Mage, dbt, DuckDB and Superset☆53Updated 4 months ago
- A minimal docker compose setup for experimenting with cloud agnostic Lakehouse Architectures Apache Spark with Hive Metastore + Delta Lak…☆21Updated 11 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆53Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 7 months ago
- Public-facing example applications built using the Snowflake Native App Framework☆43Updated 2 months ago
- dbt-starrocks contains all of the code enabling dbt to work with StarRocks☆27Updated last week
- learning-by-doing data model built with dbt-core☆11Updated 3 months ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 2 years ago
- Pipeline library for StreamSets Data Collector and Transformer☆33Updated 2 years ago