Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
☆41Dec 15, 2025Updated 6 months ago
Alternatives and similar repositories for building-lakehouse
Users that are interested in building-lakehouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆51Dec 2, 2023Updated 2 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆45Feb 9, 2026Updated 4 months ago
- The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a compl…☆18Dec 26, 2023Updated 2 years ago
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Jan 27, 2025Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆79Sep 2, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Local AWS - a lightweight AWS service emulator☆48Jun 21, 2026Updated last week
- A custom end-to-end analytics platform for customer churn☆10May 15, 2025Updated last year
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆69Sep 23, 2023Updated 2 years ago
- Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo htt…☆13Nov 1, 2024Updated last year
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated 2 years ago
- Helm Charts for RisingWave☆24Jun 11, 2026Updated 2 weeks ago
- ☆15Mar 24, 2026Updated 3 months ago
- Data Mesh Pattern☆39Oct 18, 2023Updated 2 years ago
- Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.☆10Jan 23, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- End-to-end data platform leveraging the Modern data stack☆52Apr 10, 2024Updated 2 years ago
- Files for the Docker and Kubernetes on Google Cloud Hands-On labs☆11Mar 14, 2023Updated 3 years ago
- ☆14Oct 18, 2020Updated 5 years ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆46Mar 7, 2024Updated 2 years ago
- A turnkey MLOps pipeline demonstrating how to go from raw events to real-time predictions at scale.☆247Oct 21, 2025Updated 8 months ago
- KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.☆16Apr 2, 2026Updated 2 months ago
- ☆13Mar 30, 2024Updated 2 years ago
- ☆17Apr 1, 2025Updated last year
- SQLMesh example projects☆42Jul 2, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆10Nov 2, 2023Updated 2 years ago
- trino monitoring with JMX metrics through Prometheus and Grafana☆17Aug 14, 2024Updated last year
- This project applies the core knowledge from the LLMOps module, including the design and implementation of the API Layer, Inference Layer…☆76Dec 27, 2025Updated 6 months ago
- ☆26Jun 29, 2023Updated 3 years ago
- ☆10Apr 2, 2024Updated 2 years ago
- A curated list of awesome deep learning applications in the field of computational biology☆11Aug 3, 2016Updated 9 years ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- ☆10Feb 2, 2024Updated 2 years ago
- A Python CLI application that demonstrates how you can access AWS services, such as Amazon S3 and Amazon Athena, using trusted identity p…☆13Mar 11, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The Modern Data Stack in a (Smaller) Box☆12Jan 28, 2023Updated 3 years ago
- ☆14Sep 18, 2018Updated 7 years ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Updated this week
- minio as local storage and DynamoDB as catalog☆15May 14, 2024Updated 2 years ago
- Visualize linear programming at https://lpviz.net☆42Jun 21, 2026Updated last week
- Streaming Generative AI Application on AWS☆14Jun 24, 2024Updated 2 years ago
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago