Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
☆40Dec 15, 2025Updated 4 months ago
Alternatives and similar repositories for building-lakehouse
Users that are interested in building-lakehouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆52Dec 2, 2023Updated 2 years ago
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Jan 27, 2025Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆77Sep 2, 2023Updated 2 years ago
- Local AWS - a lightweight AWS service emulator☆43Apr 12, 2026Updated last week
- A custom end-to-end analytics platform for customer churn☆11May 15, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated last year
- npm package for ZetaSQL library☆16Sep 3, 2024Updated last year
- 📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.☆68Jan 18, 2025Updated last year
- Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.☆10Jan 23, 2023Updated 3 years ago
- Files for the Docker and Kubernetes on Google Cloud Hands-On labs☆11Mar 14, 2023Updated 3 years ago
- ☆14Oct 18, 2020Updated 5 years ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆45Mar 7, 2024Updated 2 years ago
- ☆11Nov 26, 2024Updated last year
- A turnkey MLOps pipeline demonstrating how to go from raw events to real-time predictions at scale.☆243Oct 21, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- GoogleSQL dialect format server using ZetaSQL☆22Dec 16, 2021Updated 4 years ago
- trino monitoring with JMX metrics through Prometheus and Grafana☆17Aug 14, 2024Updated last year
- A Firebase Cloud Function and a Firebase hosted web app to treat weather data collected by Cloud IoT Core☆18Mar 10, 2019Updated 7 years ago
- A collection of examples built with AWS DataOps Development Kit (DDK)☆43Mar 23, 2026Updated 3 weeks ago
- This project applies the core knowledge from the LLMOps module, including the design and implementation of the API Layer, Inference Layer…☆72Dec 27, 2025Updated 3 months ago
- zetaSQL analyzer☆19Sep 11, 2020Updated 5 years ago
- VSCode extension for working with Architecture As A Code in the C4 model. Includes syntax highlighting, diagram preview, and tools for wo…☆36Apr 7, 2026Updated last week
- ☆26Jun 29, 2023Updated 2 years ago
- Short Range Ultrasonic Radar - A simple radar using the ultrasonic sensor, this radar works by measuring a range from 3cm to 40 cm as non…☆19Nov 11, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆10Apr 2, 2024Updated 2 years ago
- ☆10Feb 2, 2024Updated 2 years ago
- A Python CLI application that demonstrates how you can access AWS services, such as Amazon S3 and Amazon Athena, using trusted identity p…☆13Mar 11, 2025Updated last year
- A platform for Applied Reinforcement Learning (Applied RL)☆13Jan 19, 2019Updated 7 years ago
- ☆11Aug 20, 2024Updated last year
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Updated this week
- minio as local storage and DynamoDB as catalog☆15May 14, 2024Updated last year
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆43Apr 22, 2023Updated 2 years ago
- Automate data collection from Spotify's worldwide ranking in 50+ countries☆24May 3, 2020Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A Table format agnostic data sharing framework☆42Feb 4, 2024Updated 2 years ago
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆48Updated this week
- Code for the paper: Kernel Distributionally Robust Optimization☆13Feb 21, 2021Updated 5 years ago
- ☆23Apr 8, 2026Updated last week
- Robust Bond Portfolio Construction via Convex-Concave Saddle Point Optimization☆14May 13, 2024Updated last year
- End-to-End deployment of E-commerce customers segmentation using Clustering Machine learning algorithms in Google Cloud Platform and MLOp…☆19Jun 5, 2024Updated last year
- ☆16Oct 18, 2023Updated 2 years ago