paypal / dione
Dione - a Spark and HDFS indexing library
☆49Updated 5 months ago
Related projects: ⓘ
- Extensible streaming ingestion pipeline on top of Apache Spark☆43Updated 5 months ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated last year
- Schema Registry integration for Apache Spark☆39Updated last year
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 2 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆53Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 6 months ago
- Multi-hop declarative data pipelines☆86Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆82Updated 5 months ago
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆51Updated this week
- Rocksdb state storage implementation for Structured Streaming.☆16Updated 3 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆62Updated 4 months ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 5 months ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- LinkedIn's version of Apache Calcite☆22Updated 5 months ago
- ☆23Updated last week
- Data Sketches for Apache Spark☆21Updated last year
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Updated 2 months ago
- ☆13Updated last month
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆58Updated last year
- A Java connector for delta.io/sharing/ that allows you to easily ingest data on any JVM.☆13Updated 5 months ago
- The Internals of Apache Kafka☆47Updated 9 months ago
- A testing framework for Trino☆25Updated last month
- Dynamic Conformance Engine☆30Updated 4 months ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆14Updated 6 months ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- Flowchart for debugging Spark applications☆100Updated last week
- Paper: A Zero-rename committer for object stores☆20Updated 3 years ago