adidas / lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
☆229Updated 2 months ago
Alternatives and similar repositories for lakehouse-engine:
Users that are interested in lakehouse-engine are comparing it to the libraries listed below
- Delta Lake helper methods in PySpark☆312Updated 4 months ago
- Delta Lake examples☆214Updated 3 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆167Updated last week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆192Updated 3 weeks ago
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows☆41Updated 6 months ago
- Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipeline☆150Updated 5 months ago
- Code samples, etc. for Databricks☆62Updated this week
- Data product portal created by Dataminded☆168Updated this week
- Data pipeline with dbt, Airflow, Great Expectations☆160Updated 3 years ago
- Demo DAGs that show how to run dbt Core in Airflow using Cosmos☆52Updated 3 months ago
- PySpark test helper methods with beautiful error messages☆648Updated this week
- ☆107Updated 5 months ago
- Custom PySpark Data Sources☆36Updated last month