Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
☆40Dec 15, 2025Updated 5 months ago
Alternatives and similar repositories for building-lakehouse
Users that are interested in building-lakehouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆51Dec 2, 2023Updated 2 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆44Feb 9, 2026Updated 3 months ago
- The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a compl…☆18Dec 26, 2023Updated 2 years ago
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Jan 27, 2025Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆78Sep 2, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆15Mar 24, 2026Updated 2 months ago
- End-to-end data platform leveraging the Modern data stack☆52Apr 10, 2024Updated 2 years ago
- Files for the Docker and Kubernetes on Google Cloud Hands-On labs☆11Mar 14, 2023Updated 3 years ago
- This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster☆15Sep 9, 2021Updated 4 years ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆45Mar 7, 2024Updated 2 years ago
- ☆11Nov 26, 2024Updated last year
- KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.☆15Apr 2, 2026Updated 2 months ago
- ☆13Mar 30, 2024Updated 2 years ago
- ☆17Apr 1, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- GoogleSQL dialect format server using ZetaSQL☆22Dec 16, 2021Updated 4 years ago
- ☆10Nov 2, 2023Updated 2 years ago
- trino monitoring with JMX metrics through Prometheus and Grafana☆17Aug 14, 2024Updated last year
- This project applies the core knowledge from the LLMOps module, including the design and implementation of the API Layer, Inference Layer…☆75Dec 27, 2025Updated 5 months ago
- ☆26Jun 29, 2023Updated 2 years ago
- ☆10Apr 2, 2024Updated 2 years ago
- Spark-based pipeline to extract and parse monthly games from the Lichess database.☆21Sep 22, 2025Updated 8 months ago
- A curated list of awesome deep learning applications in the field of computational biology☆11Aug 3, 2016Updated 9 years ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆10Feb 2, 2024Updated 2 years ago
- ☆11Aug 20, 2024Updated last year
- A tool to detect tissue- and cancer- specific epigenetic signatures in WGS data of liquid biopsies☆10Mar 30, 2023Updated 3 years ago
- minio as local storage and DynamoDB as catalog☆15May 14, 2024Updated 2 years ago
- Streaming Generative AI Application on AWS☆14Jun 24, 2024Updated last year
- Hybrid Vector Search☆28May 4, 2026Updated last month
- Automate data collection from Spotify's worldwide ranking in 50+ countries☆25May 3, 2020Updated 6 years ago
- A cloud native data mesh implementation☆12Jan 15, 2021Updated 5 years ago
- Code for the paper: Kernel Distributionally Robust Optimization☆13Feb 21, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆48Updated this week
- ☆23May 18, 2026Updated 3 weeks ago
- End-to-End deployment of E-commerce customers segmentation using Clustering Machine learning algorithms in Google Cloud Platform and MLOp…☆19Jun 5, 2024Updated 2 years ago
- trino + hive + minio with postgres in docker compose☆27Aug 18, 2023Updated 2 years ago
- ☆16Oct 18, 2023Updated 2 years ago
- ☆16Feb 11, 2026Updated 3 months ago
- ☆20Mar 13, 2025Updated last year