manuzhang/awesome-lakehouse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/manuzhang/awesome-lakehouse)

manuzhang / awesome-lakehouse

a curated list of awesome lakehouse frameworks, applications, etc

☆49

Alternatives and similar repositories for awesome-lakehouse

Users that are interested in awesome-lakehouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
aws-samples / apache-xtable-on-aws-samples
View on GitHub
☆11Updated this week
Upsolver / iceberg-diag
View on GitHub
☆30Dec 4, 2024Updated last year
lakevision-project / lakevision
View on GitHub
Lakevision is a tool which provides insights into your Apache Iceberg based Data Lakehouse.
☆52Apr 11, 2026Updated 3 months ago
ray-project / deltacat
View on GitHub
A portable Multimodal Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to you…
☆282Apr 17, 2026Updated 3 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
justinj / nulldb
View on GitHub
☆14Jun 10, 2024Updated 2 years ago
criccomini / hive-metastore-standalone
View on GitHub
Apache Hive Metastore in Standalone Mode With Docker
☆14Jul 22, 2024Updated 2 years ago
b0wter / fbrary
View on GitHub
Create, manage and edit your audio book library from the command line.
☆10Oct 20, 2024Updated last year
linkedin / openhouse
View on GitHub
Open Control Plane for Tables in Data Lakehouse
☆392Updated this week
DataChefHQ / aws-data-landing-zone
View on GitHub
The Data Landing Zone is a CDK Construct designed to create a landing zone tailored for supporting and enabling AI, data-driven, data mes…
☆23Updated this week
awslabs / data-solutions-framework-on-aws
View on GitHub
An open-source framework that simplifies implementation of data solutions.
☆147Dec 2, 2025Updated 7 months ago
rajagurunath / lakehouse-sharing
View on GitHub
A Table format agnostic data sharing framework
☆42Feb 4, 2024Updated 2 years ago
adamgfraser / 0-to-100-with-zio-test
View on GitHub
☆14May 28, 2020Updated 6 years ago
Eventual-Inc / daft-cli
View on GitHub
A cli for spinning up and managing Ray clusters for the Daft Query Engine.
☆14Feb 15, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
projectnessie / iceberg-catalog-migrator
View on GitHub
CLI tool to bulk migrate the tables from one catalog another without a data copy
☆85Apr 12, 2025Updated last year
unitycatalog / unitycatalog-rs
View on GitHub
Open, Multi-modal Catalog for Data & AI, written in Rust
☆86Sep 30, 2024Updated last year
Ackuq / spark-pit
View on GitHub
Point-in-Time optimizations for Apache Spark
☆30Jan 18, 2024Updated 2 years ago
lakekeeper / console
View on GitHub
A leightweight UI for Lakekeeper
☆19Updated this week
softwaremill / scala3-macro-debug
View on GitHub
☆16May 21, 2021Updated 5 years ago
olympiaformat / olympia
View on GitHub
Olympia is a storage-only open catalog format for big data analytics, ML & AI.
☆16May 5, 2025Updated last year
apache / iceberg-python
View on GitHub
PyIceberg
☆1,102Updated this week
byte-genie / examples-genie
View on GitHub
Usage examples for byte-genie API
☆12Apr 27, 2024Updated 2 years ago
morristai / iceberg-mcp
View on GitHub
MCP server for Apache Iceberg
☆34Nov 17, 2025Updated 8 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
sodadata / soda-spark
View on GitHub
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Mar 23, 2026Updated 4 months ago
awslabs / amazon-emr-vscode-toolkit
View on GitHub
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
☆39Feb 17, 2025Updated last year
kaaveland / pyarrowfs-adlgen2
View on GitHub
Use pyarrow with Azure Data Lake gen2
☆29Jun 27, 2024Updated 2 years ago
sarmadgardezi / Google-Spreadsheet-Formulas
View on GitHub
How to add formulas to Google Spreadsheet using Google Apps Script - Sarmad Gardezi
☆17Apr 24, 2025Updated last year
astrojuanlu / kedro-init
View on GitHub
A simple CLI command that initialises a Kedro project from an existing Python package
☆11Aug 23, 2024Updated last year
julienledem / parquet-metadata-visualizer
View on GitHub
claude-code generated parquet metadata vizualizer that runs in your browser
☆15Dec 8, 2025Updated 7 months ago
lance-format / lance-namespace
View on GitHub
Lance Namespace is an open specification for describing access and operations against a collection of tables in a multimodal lakehouse
☆56Jul 3, 2026Updated 3 weeks ago
amanparmar17 / Kafka_Pyspark
View on GitHub
Base Kafka Producer, consumer, flask api and PySpark Structured streaming Job
☆11Oct 20, 2021Updated 4 years ago
Query-farm / pyroscope
View on GitHub
DuckDB Pyroscope Extension for Continuous Profiling
☆21Feb 18, 2026Updated 5 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
lakekeeper / lakekeeper
View on GitHub
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
☆1,399Updated this week
typelevel / cats-effect-shell
View on GitHub
Command line debugging console for Cats Effect
☆19Apr 2, 2024Updated 2 years ago
jdegoes / zio
View on GitHub
ZIO — A principled, powerful, standalone effect data type for any Scala project.
☆13Mar 28, 2025Updated last year
dair-ai / data_science_writing_primer
View on GitHub
Writing Primer for Data Scientists
☆18Feb 19, 2020Updated 6 years ago
TFMV / icebox
View on GitHub
Iceberg Playground in a Box
☆70Apr 8, 2026Updated 3 months ago
mag1cfrog / timeseries-table-format
View on GitHub
Rust-native time-series table format with gap/overlap tracking and SQL queries
☆16Mar 18, 2026Updated 4 months ago
canimus / cuallee
View on GitHub
Possibly the fastest DataFrame-agnostic quality check library in town.
☆248Feb 5, 2026Updated 5 months ago