brooklyn-data / delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
☆10Updated 2 years ago
Alternatives and similar repositories for delta:
Users that are interested in delta are comparing it to the libraries listed below
- ☆13Updated this week
- Magic to help Spark pipelines upgrade☆34Updated 4 months ago
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 4 years ago
- Examples of Spark 3.0☆46Updated 4 years ago
- This project contains a couple of tools to analyze data around the Apache Flink community.☆18Updated 8 months ago
- ☆25Updated 5 months ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated 11 months ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated last month
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 10 months ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- Dione - a Spark and HDFS indexing library☆51Updated 10 months ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- Port of TPC-DS dsdgen to Java☆48Updated 6 months ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- Lab for testing different Flink job latency optimization techniques covered in a Flink Forward 2021 talk☆27Updated 3 years ago
- Snowflake Data Source for Apache Spark.☆224Updated 2 months ago
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- Apache Iceberg Documentation Site☆42Updated last year
- Spark SQL listener to record lineage information☆28Updated 4 years ago
- Plugin for Presto to allow addition of user functions easily☆116Updated 3 years ago
- Extensions available for use in Apiary☆10Updated 5 months ago
- Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API☆14Updated 4 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 3 years ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆86Updated 10 months ago