manuzhang / awesome-lakehouse
a curated list of awesome lakehouse frameworks, applications, etc
☆21Updated 3 weeks ago
Alternatives and similar repositories for awesome-lakehouse:
Users that are interested in awesome-lakehouse are comparing it to the libraries listed below
- A Table format agnostic data sharing framework☆38Updated 11 months ago
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆70Updated this week
- Multi-hop declarative data pipelines☆103Updated this week
- Delta reader for the Ray open-source toolkit for building ML applications☆43Updated 11 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated 3 weeks ago
- ☆79Updated last year
- Apache Hive Metastore as a Standalone server in Docker☆67Updated 4 months ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Unity Catalog UI☆39Updated 4 months ago
- ☆47Updated 5 months ago
- Magic to help Spark pipelines upgrade☆34Updated 3 months ago
- Pythonic Iceberg REST Catalog☆69Updated 4 months ago
- A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL☆38Updated 3 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆93Updated this week
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆50Updated this week
- ☆15Updated 6 months ago
- Monitoring and insights on your data lakehouse tables☆27Updated 2 weeks ago
- The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark…☆82Updated last week
- dbt-starrocks contains all of the code enabling dbt to work with StarRocks☆24Updated 3 months ago
- A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.☆38Updated 5 months ago
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆72Updated this week
- Point-in-Time optimizations for Apache Spark☆29Updated 11 months ago
- ☆62Updated this week
- Presto Trino with Apache Hive Postgres metastore☆38Updated 4 months ago
- Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.☆60Updated last year
- In-Memory Analytics for Kafka using DuckDB☆90Updated this week
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.☆155Updated last month
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 10 months ago