minio / openlake
Build Data Lake using Open Source tools
☆99Updated 6 months ago
Alternatives and similar repositories for openlake
Users that are interested in openlake are comparing it to the libraries listed below
Sorting:
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆35Updated last year
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆98Updated 2 years ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆147Updated this week
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆56Updated last year
- ☆265Updated 6 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- Open Control Plane for Tables in Data Lakehouse☆350Updated this week
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake☆257Updated last week
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆37Updated 2 months ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆28Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆226Updated 3 months ago
- Apache Hive Metastore as a Standalone server in Docker☆75Updated 8 months ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Minimal example to run Trino, Minio, and Hive standalone metastore on docker☆52Updated 2 years ago
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆185Updated 3 weeks ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆70Updated 7 months ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆46Updated last year
- Adapter for dbt that executes dbt pipelines on Apache Flink☆95Updated last year
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)☆235Updated last month
- Operator for Apache Spark-on-Kubernetes for Stackable Data Platform☆62Updated this week
- Building a Data Pipeline with an Open Source Stack☆54Updated 10 months ago
- A curated list of awesome DataOps tools☆188Updated 7 months ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆26Updated last year
- dbt integration for Cube☆11Updated this week
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆72Updated 2 weeks ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆246Updated 3 months ago
- REST API for Apache Spark on K8S or YARN☆98Updated this week
- Data product portal created by Dataminded☆185Updated this week
- dbt-starrocks contains all of the code enabling dbt to work with StarRocks☆37Updated 3 weeks ago
- Helm charts for Trino and Trino Gateway☆165Updated last week