Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
☆38Dec 15, 2025Updated 2 months ago
Alternatives and similar repositories for building-lakehouse
Users that are interested in building-lakehouse are comparing it to the libraries listed below
Sorting:
- Đồ án tốt nghiệp | Data Lakehouse☆36Feb 9, 2026Updated last month
- A custom end-to-end analytics platform for customer churn☆11May 15, 2025Updated 9 months ago
- ☆66Sep 24, 2025Updated 5 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Sep 2, 2023Updated 2 years ago
- VSCode extension for working with Architecture As A Code in the C4 model. Includes syntax highlighting, diagram preview, and tools for wo…☆32Feb 25, 2026Updated last week
- 📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.☆59Jan 18, 2025Updated last year
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆66Sep 23, 2023Updated 2 years ago
- Data Mesh Pattern☆38Oct 18, 2023Updated 2 years ago
- ☆11Oct 19, 2023Updated 2 years ago
- Natural Language Processing Project☆11Jul 6, 2021Updated 4 years ago
- This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster☆15Sep 9, 2021Updated 4 years ago
- Package for parsing XBRL☆10Jul 3, 2022Updated 3 years ago
- Automation for SAP - Collection of Ansible Modules for various tasks using SAP Launchpad APIs☆13Nov 13, 2025Updated 3 months ago
- 参考 Chat2DB 的效果,使用 chatgpt 进行自然语言翻译,然后对数据库进行操作,使用 rust 语言实现的 web 应用。☆10Jan 13, 2025Updated last year
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆46Dec 14, 2025Updated 2 months ago
- Generate test cases from Jira or Azure using AI☆11Apr 6, 2024Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆48Oct 14, 2024Updated last year
- A Python CLI application that demonstrates how you can access AWS services, such as Amazon S3 and Amazon Athena, using trusted identity p…☆12Mar 11, 2025Updated 11 months ago
- Amazon Marketing Cloud Insights on AWS helps advertisers and agencies running campaigns on Amazon Ads to easily deploy AWS services to st…☆16Nov 3, 2025Updated 4 months ago
- ⚡ FutureGPT - Application development framework that connects GPT-4 with external data, the internet, other applications and language mod…☆12May 14, 2023Updated 2 years ago
- 🚀 Portfolio: Co-Pilot, 💡 Investing: Idea Generation, 🚦Trade: Due Diligence☆17Jun 27, 2025Updated 8 months ago
- ☆15May 27, 2025Updated 9 months ago
- Tests of how we can gather the most out of AutoGPT. Huge thanks to @Torantulino for this☆13Apr 16, 2023Updated 2 years ago
- This project provides an AI-driven test case generator using FastAPI. The application accepts a GitHub repository name and generates test…☆19Jun 7, 2024Updated last year
- ☆11Nov 26, 2024Updated last year
- ☆12Apr 13, 2024Updated last year
- My implementation of "Build a Stock-Tracking CLI with Async Streams in Rust" - The Actor Model☆10Sep 20, 2024Updated last year
- Python packages for Support Vector Regression with Linear Constraints☆10Jul 9, 2020Updated 5 years ago
- ☆10Apr 2, 2024Updated last year
- Generic Pipelines / Templates for Data Factory / Synapse Pipelines w.r.t Different MSFT Offering Integrations / Use Cases☆11Sep 26, 2025Updated 5 months ago
- 100% natural cola recipe based on Open Cola recipe.☆13Mar 16, 2021Updated 4 years ago
- Utility package that, given a Pandas DataFrame, it uses the DataSchema class which auto-infers feature types and automatically calculates…☆16Feb 18, 2025Updated last year
- A Terraform Version Manager written in Go☆11Oct 3, 2023Updated 2 years ago
- The Stock Market Management System is a Python application designed to simulate and understand working of Stock market☆11Nov 5, 2023Updated 2 years ago
- Oh no! its another version manager for terraform☆11Jun 24, 2022Updated 3 years ago
- Serverless Multi-Tenant Application on AWS Amplify☆17Jan 11, 2024Updated 2 years ago
- Mastering Docker Enterprise published by Packt☆11Jan 30, 2023Updated 3 years ago
- Code for the paper: Kernel Distributionally Robust Optimization☆13Feb 21, 2021Updated 5 years ago
- A demo site built on top of Kafka topics☆13Feb 28, 2026Updated last week