Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations should be performed.
☆10Jul 31, 2023Updated 2 years ago
Alternatives and similar repositories for lighthouse
Users that are interested in lighthouse are comparing it to the libraries listed below
Sorting:
- Delta Acceptance Testing☆23Aug 25, 2025Updated 6 months ago
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10May 12, 2023Updated 2 years ago
- ☆59Jan 3, 2024Updated 2 years ago
- Delta Lake helper methods. No Spark dependency.☆22Jan 19, 2026Updated last month
- mercury-graph is a Python library that offers graph analytics capabilities with a technology-agnostic API.☆39Mar 21, 2025Updated 11 months ago
- Pandas helper functions☆31Feb 19, 2023Updated 3 years ago
- Instant search for and access to many datasets in Pyspark.☆34Oct 6, 2022Updated 3 years ago
- A Minimalistic Rust Implementation of Delta Sharing Server.☆97Mar 17, 2025Updated 11 months ago
- Stac-fastapi implementation with DuckDB backend.☆15Sep 14, 2025Updated 5 months ago
- Let Pydantic and Shapely work together!☆18Jan 27, 2026Updated last month
- Collaborative Synchronized Corpus Annotation Tool☆11Dec 31, 2018Updated 7 years ago
- PySpark schema generator☆44Feb 23, 2023Updated 3 years ago
- En este repositorio se encuentra un modelo NLP entrenado para la reservación de vuelos de una compañía que denominamos Cloud Airlines. Ta…☆10Sep 17, 2019Updated 6 years ago
- Harmonic track list maker based on the Camelot key system.☆11Feb 19, 2020Updated 6 years ago
- ☆12May 25, 2017Updated 8 years ago
- Service to evaluate quality measure and cohort specifications against a target patient data set.☆11Jun 2, 2022Updated 3 years ago
- A Delta Lake reader for Dask☆53Jul 29, 2025Updated 6 months ago
- ☆12Sep 19, 2022Updated 3 years ago
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows☆45Jan 24, 2026Updated last month
- Exploring modern RESTful services for gridded data☆12May 8, 2022Updated 3 years ago
- An example of SparkConnect extension.☆15Mar 5, 2024Updated last year
- Get a nicely-chunked local copy of the biomedical literature (to use for other projects)!☆14Jun 10, 2024Updated last year
- Python library allowing to manipulate data split into a collection of groups stored in Zarr format.☆13Jul 11, 2025Updated 7 months ago
- Python3 wrapper for parallelized gene prediction using Prodigal☆11Mar 3, 2023Updated 2 years ago
- Trying out the Dataframe Polars library with Delta Lake ... feat Python.☆12Jan 29, 2025Updated last year
- NOAA Phase 2 Hydrological Data Processing☆13Oct 9, 2023Updated 2 years ago
- ADT support for Flink with Shapeless☆12Jan 11, 2020Updated 6 years ago
- A benchmark tool for lakehouses.☆14Mar 12, 2023Updated 2 years ago
- [WORK IN PROGRESS] Create STAC Items from vector datasets☆10Dec 11, 2023Updated 2 years ago
- An instance segmentation challenge on Basketball images, with a particular focus on occlusion resolution. An opportunity to publish at MM…☆16Aug 8, 2023Updated 2 years ago
- Go library for efficient skyline queries☆18Aug 17, 2025Updated 6 months ago
- Terraform resources to setup oauth2-protected MLFlow server in AWS infrastructure☆10Aug 18, 2023Updated 2 years ago
- Delta lake and filesystem helper methods☆50Feb 29, 2024Updated 2 years ago
- Published in PLOS ONE. Phage-host interaction prediction tool that uses protein language models to represent the receptor-binding protein…☆17Jul 13, 2025Updated 7 months ago
- Add geo functionality extension to datafusion query engine.☆11Apr 26, 2024Updated last year
- ☆18Nov 27, 2020Updated 5 years ago
- Spatio Temporal Asset Tasking with FastAPI☆15Apr 1, 2025Updated 10 months ago
- ☆15Apr 2, 2024Updated last year
- Xarray backend to map an ECMWF style request to a service onto an XArray Dataset☆14Dec 5, 2025Updated 2 months ago