brooklyn-data / deltaLinks
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
☆10Updated 2 years ago
Alternatives and similar repositories for delta
Users that are interested in delta are comparing it to the libraries listed below
Sorting:
- Magic to help Spark pipelines upgrade☆35Updated 8 months ago
- ☆14Updated 3 weeks ago
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 4 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 5 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated last year
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 6 years ago
- Apache Spark ETL Utilities☆40Updated 8 months ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- ☆39Updated 6 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated 2 years ago
- Code snippets used in demos recorded for the blog.☆37Updated 2 weeks ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 5 months ago
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- A Table format agnostic data sharing framework☆38Updated last year
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Dione - a Spark and HDFS indexing library☆52Updated last year
- Apache Ranger Plugin for S3☆20Updated 2 years ago
- ☆30Updated 2 weeks ago
- ☆40Updated 2 years ago
- Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API☆14Updated 5 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- Apache Iceberg Documentation Site☆42Updated last year
- Random implementation notes☆34Updated 12 years ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆20Updated 5 years ago
- Port of TPC-DS dsdgen to Java☆50Updated 10 months ago