The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
☆285Mar 4, 2026Updated 3 weeks ago
Alternatives and similar repositories for lakehouse-engine
Users that are interested in lakehouse-engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆227Mar 11, 2026Updated 2 weeks ago
- ☆15Aug 28, 2025Updated 6 months ago
- A cloud data platform product to accelerate time to insights. Our open-source framework is designed for the real world. Stripping away th…☆24Mar 9, 2026Updated 2 weeks ago
- Delta Lake helper methods in PySpark☆328Jan 19, 2026Updated 2 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆201Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A DataOps framework for building a lakehouse.☆56Mar 18, 2026Updated last week
- ☆18Aug 6, 2024Updated last year
- This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.☆46Jan 27, 2025Updated last year
- Open, Multi-modal Catalog for Data & AI☆3,336Updated this week
- ☆18May 26, 2025Updated 10 months ago
- PySpark test helper methods with beautiful error messages☆756Updated this week
- Notebooks to learn Databricks Lakehouse Platform☆42Updated this week
- A Table format agnostic data sharing framework☆42Feb 4, 2024Updated 2 years ago
- Notebooks e dicas sobre Databricks☆28Nov 5, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆28May 19, 2025Updated 10 months ago
- Upload of all my presentations which I've been doing in the past☆10Mar 8, 2026Updated 2 weeks ago
- A portable Datamart and Business Intelligence suite built with Docker, sqlmesh + dbtcore, DuckDB and Superset☆59Mar 9, 2026Updated 2 weeks ago
- Databricks Platform - Architecture, Security, Automation and much more!!☆55Updated this week
- Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team …☆134Updated this week
- adidas Data Mesh implementation☆12May 13, 2022Updated 3 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆685Mar 6, 2025Updated last year
- Data pipeline project using Data Factory, Databricks and Cosmosdb Graph, deployed using Azure DevOps, secured using firewalls and Azure A…☆11Dec 14, 2022Updated 3 years ago
- Code snippets used in demos recorded for the blog.☆40Mar 12, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- 🏃♀️ Minimalist SQL orchestrator☆314Mar 17, 2026Updated last week
- ☆18Apr 10, 2025Updated 11 months ago
- The source code for the book Modern Data Engineering with Apache Spark☆39Jul 26, 2022Updated 3 years ago
- In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO,…☆11Jun 27, 2023Updated 2 years ago
- Open Control Plane for Tables in Data Lakehouse☆382Mar 19, 2026Updated last week
- A benchmark tool for lakehouses.☆14Mar 12, 2023Updated 3 years ago
- ☆23May 16, 2023Updated 2 years ago
- ☆25Feb 14, 2025Updated last year
- ☆12Mar 7, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A series of workshop modules introducing Feast feature store.☆19May 31, 2022Updated 3 years ago
- Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipeli…☆653Mar 1, 2026Updated 3 weeks ago
- Samples for fabric user data functions☆26Mar 16, 2026Updated last week
- 🧱 A collection of supplementary utilities and helper notebooks to perform admin tasks on Databricks☆57Jul 4, 2025Updated 8 months ago
- Yet Another (Spark) ETL Framework☆21Oct 21, 2023Updated 2 years ago
- ☆42Dec 19, 2023Updated 2 years ago
- Nessie: Transactional Catalog for Data Lakes with Git-like semantics☆1,439Updated this week