dipankarmazumdar / awesome-lakehouse-guideView external linksLinks
Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture
☆140Jan 21, 2026Updated 3 weeks ago
Alternatives and similar repositories for awesome-lakehouse-guide
Users that are interested in awesome-lakehouse-guide are comparing it to the libraries listed below
Sorting:
- A repository of blogs/videos that presents how Apache Iceberg is being used in Production by various orgs☆18Jul 31, 2023Updated 2 years ago
- "Nature's economy shall be the base for our own, for it is immutable, but ours is secondary. An economist without knowledge of nature is …☆20May 31, 2021Updated 4 years ago
- MPC Server for PySpark inpired by the LakeSail☆17Feb 7, 2026Updated last week
- Monitoring and insights on your data lakehouse tables☆32Jan 28, 2026Updated 2 weeks ago
- Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processin…☆1,161Updated this week
- Deploy a complete data stack in just a couple of minutes.☆15Mar 6, 2024Updated last year
- Local Environment to Practice Data Engineering☆144Dec 30, 2024Updated last year
- ☆15Mar 27, 2023Updated 2 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆30Feb 1, 2026Updated 2 weeks ago
- ☆21Feb 5, 2024Updated 2 years ago
- Mock streaming data generator☆17May 31, 2024Updated last year
- Practical Data Engineering: A Hands-On Real-Estate Project Guide☆769Sep 3, 2024Updated last year
- Iceberg Playground in a Box☆67Jun 27, 2025Updated 7 months ago
- Ibis analytics, with Ibis (and more!)☆24Sep 24, 2024Updated last year
- Apache Spark Kubernetes Operator☆257Updated this week
- Configura containers do Spark (Master, Workers e History Server) + Jupyter☆21Jun 17, 2024Updated last year
- Cost Efficient Data Pipelines with DuckDB☆61May 14, 2025Updated 9 months ago
- Delta Lake examples☆239Oct 8, 2024Updated last year
- Open source stack lakehouse☆25Mar 2, 2024Updated last year
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆25Mar 3, 2024Updated last year
- Don't Panic. This guide will help you when it feels like the end of the world.☆30Feb 7, 2026Updated last week
- Operator for Apache Spark-on-Kubernetes for Stackable Data Platform☆69Feb 6, 2026Updated last week
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆31Apr 13, 2023Updated 2 years ago
- Apache DataFusion Comet Spark Accelerator☆1,134Updated this week
- 📙 Awesome Data Catalogs and Observability Platforms.☆993Aug 14, 2025Updated 6 months ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆30Feb 27, 2024Updated last year
- In-browser data analysis using SQL | Powered by duckdb-wasm☆26Dec 21, 2025Updated last month
- AI agent debugging, collaboration, and trace observability. Built for teams using CrewAI, OpenAI, and more.☆13Updated this week
- Open, Multi-modal Catalog for Data & AI☆3,305Updated this week
- end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence☆233Dec 10, 2025Updated 2 months ago
- The smallest DuckDB SQL orchestrator on Earth.☆337Nov 22, 2025Updated 2 months ago
- Compare DuckDB, Polars and Pandas for generating an artificial dataset of persons and companies☆35Aug 31, 2023Updated 2 years ago
- Data Mesh Architecture☆84Oct 15, 2025Updated 4 months ago
- Beyond Vibe Coding. Code, Planning, Documentation and Product Management agents.☆70Jun 16, 2025Updated 8 months ago
- This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The fictici…☆14Sep 30, 2024Updated last year
- ☆18Jul 4, 2025Updated 7 months ago
- ☆32Apr 4, 2022Updated 3 years ago
- ☆10Sep 28, 2023Updated 2 years ago
- MLOps Deploy Solutions with Rust☆38Sep 5, 2023Updated 2 years ago