minio / openlakeLinks
Build Data Lake using Open Source tools
☆101Updated last month
Alternatives and similar repositories for openlake
Users that are interested in openlake are comparing it to the libraries listed below
Sorting:
- Docker envinroment to stream data from Kafka to Iceberg tables☆29Updated last year
- Apache Hive Metastore as a Standalone server in Docker☆79Updated 10 months ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆153Updated this week
- Minimal example to run Trino, Minio, and Hive standalone metastore on docker☆52Updated 3 years ago
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆70Updated 5 months ago
- ☆265Updated 8 months ago
- Apache Flink (Pyflink) and Related Projects☆39Updated 2 months ago
- Apache Spark Kubernetes Operator☆180Updated this week
- A Table format agnostic data sharing framework☆38Updated last year
- ☆54Updated this week
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆35Updated last year
- A kubernetes operator for Apache NiFi☆36Updated this week
- Operator for Apache Spark-on-Kubernetes for Stackable Data Platform☆63Updated last week
- Unity Catalog UI☆40Updated 9 months ago
- Open Control Plane for Tables in Data Lakehouse☆358Updated this week
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆58Updated last year
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆99Updated 2 years ago
- Data product portal created by Dataminded☆186Updated this week
- REST API for Apache Spark on K8S or YARN☆98Updated 2 weeks ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- Helm charts for Trino and Trino Gateway☆170Updated last week
- ODD Specification is a universal open standard for collecting metadata.☆142Updated 7 months ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆254Updated 4 months ago
- Stackable Operator for Apache Airflow☆28Updated last week
- Quick Guides from Dremio on Several topics☆71Updated this week
- ☆15Updated 2 years ago
- ☆25Updated last year
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- Collection of assets used for various articles at https://blogs.min.io☆37Updated 3 months ago
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆190Updated 2 weeks ago