The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
☆287Jun 3, 2026Updated last week
Alternatives and similar repositories for lakehouse-engine
Users that are interested in lakehouse-engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆227Jun 8, 2026Updated last week
- ☆15Aug 28, 2025Updated 9 months ago
- A cloud data platform product to accelerate time to insights. Our open-source framework is designed for the real world. Stripping away th…☆25Updated this week
- Delta Lake helper methods in PySpark☆329Jan 19, 2026Updated 4 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆202May 19, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A DataOps framework for building a lakehouse.☆57Jun 5, 2026Updated last week
- ☆18Aug 6, 2024Updated last year
- This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.☆44Jan 27, 2025Updated last year
- Open, Multi-modal Catalog for Data & AI☆3,419Updated this week
- PySpark test helper methods with beautiful error messages☆769May 20, 2026Updated 3 weeks ago
- M3D Engine is a Spark application for the development of scalable data transformations and ingestions in data lakes.☆19May 4, 2021Updated 5 years ago
- Notebooks to learn Databricks Lakehouse Platform☆44Updated this week
- A Table format agnostic data sharing framework☆41Feb 4, 2024Updated 2 years ago
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆29May 19, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Upload of all my presentations which I've been doing in the past☆10May 21, 2026Updated 3 weeks ago
- Databricks Platform - Architecture, Security, Automation and much more!!☆56Jun 4, 2026Updated last week
- ☆20Oct 26, 2021Updated 4 years ago
- A portable Datamart and Business Intelligence suite built with Docker, sqlmesh + dbtcore, DuckDB and Superset☆60Apr 5, 2026Updated 2 months ago
- Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team …☆138May 15, 2026Updated last month
- adidas Data Mesh implementation☆12May 13, 2022Updated 4 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆687Jun 9, 2026Updated last week
- Data pipeline project using Data Factory, Databricks and Cosmosdb Graph, deployed using Azure DevOps, secured using firewalls and Azure A…☆11Dec 14, 2022Updated 3 years ago
- Code snippets used in demos recorded for the blog.☆42Apr 30, 2026Updated last month
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆18Jun 3, 2026Updated last week
- The source code for the book Modern Data Engineering with Apache Spark☆41Jul 26, 2022Updated 3 years ago
- 🏃♀️ Minimalist SQL orchestrator☆325Jun 9, 2026Updated last week
- Metadata Driven Development (m3d) is a cloud and platform agnostic framework for the automated creation, management and governance of dat…☆34May 23, 2023Updated 3 years ago
- Open Control Plane for Tables in Data Lakehouse☆389Jun 10, 2026Updated last week
- Extract Load Transform (ELT) framework is a metadata based batch orchestration framework for modern data platforms. Implemented using Azu…☆50Jun 5, 2026Updated last week
- A benchmark tool for lakehouses.☆13Mar 12, 2023Updated 3 years ago
- ☆23May 16, 2023Updated 3 years ago
- ☆26Feb 14, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A series of workshop modules introducing Feast feature store.☆18May 31, 2022Updated 4 years ago
- Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipeli…☆651May 6, 2026Updated last month
- ☆13Mar 7, 2025Updated last year
- Samples for fabric user data functions☆27May 22, 2026Updated 3 weeks ago
- Yet Another (Spark) ETL Framework☆21Oct 21, 2023Updated 2 years ago
- ☆42Dec 19, 2023Updated 2 years ago
- A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration…☆185Feb 13, 2024Updated 2 years ago