Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
☆40Dec 15, 2025Updated 5 months ago
Alternatives and similar repositories for building-lakehouse
Users that are interested in building-lakehouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Đồ án tốt nghiệp | Data Lakehouse☆42Feb 9, 2026Updated 3 months ago
- ☆68Sep 24, 2025Updated 7 months ago
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Jan 27, 2025Updated last year
- Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo htt…☆13Nov 1, 2024Updated last year
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- npm package for ZetaSQL library☆16Sep 3, 2024Updated last year
- ☆15Mar 24, 2026Updated last month
- Data Mesh Pattern☆39Oct 18, 2023Updated 2 years ago
- Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.☆10Jan 23, 2023Updated 3 years ago
- End-to-end data platform leveraging the Modern data stack☆52Apr 10, 2024Updated 2 years ago
- ☆14Oct 18, 2020Updated 5 years ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆45Mar 7, 2024Updated 2 years ago
- ☆11Nov 26, 2024Updated last year
- A turnkey MLOps pipeline demonstrating how to go from raw events to real-time predictions at scale.☆245Oct 21, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.☆15Apr 2, 2026Updated last month
- ☆11Oct 19, 2023Updated 2 years ago
- ☆17Apr 1, 2025Updated last year
- SQLMesh example projects☆41Jul 2, 2025Updated 10 months ago
- GoogleSQL dialect format server using ZetaSQL☆22Dec 16, 2021Updated 4 years ago
- ☆10Nov 2, 2023Updated 2 years ago
- trino monitoring with JMX metrics through Prometheus and Grafana☆17Aug 14, 2024Updated last year
- A collection of examples built with AWS DataOps Development Kit (DDK)☆43Mar 23, 2026Updated last month
- ICDE 2025 Paper, Grounding Natural Language to SQL Translation with Data-Based Self-Explanations☆17May 24, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This project applies the core knowledge from the LLMOps module, including the design and implementation of the API Layer, Inference Layer…☆75Dec 27, 2025Updated 4 months ago
- ☆26Jun 29, 2023Updated 2 years ago
- ☆10Apr 2, 2024Updated 2 years ago
- A curated list of awesome deep learning applications in the field of computational biology☆11Aug 3, 2016Updated 9 years ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- A Python CLI application that demonstrates how you can access AWS services, such as Amazon S3 and Amazon Athena, using trusted identity p…☆13Mar 11, 2025Updated last year
- A platform for Applied Reinforcement Learning (Applied RL)☆14Jan 19, 2019Updated 7 years ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Updated this week
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Hybrid Vector Search☆27May 4, 2026Updated 2 weeks ago
- Automate data collection from Spotify's worldwide ranking in 50+ countries☆25May 3, 2020Updated 6 years ago
- A Table format agnostic data sharing framework☆42Feb 4, 2024Updated 2 years ago
- A cloud native data mesh implementation☆12Jan 15, 2021Updated 5 years ago
- Code for the paper: Kernel Distributionally Robust Optimization☆13Feb 21, 2021Updated 5 years ago
- ☆23Updated this week
- ☆16Oct 18, 2023Updated 2 years ago