steveloughran / zero-rename-committerLinks
Paper: A Zero-rename committer for object stores
☆20Updated 4 years ago
Alternatives and similar repositories for zero-rename-committer
Users that are interested in zero-rename-committer are comparing it to the libraries listed below
Sorting:
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- Splittable Gzip codec for Hadoop☆70Updated last week
- Spark Structured Streaming State Tools☆34Updated 5 years ago
- A testing framework for Trino☆26Updated 3 months ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 6 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- ☆22Updated 6 years ago
- Dione - a Spark and HDFS indexing library☆52Updated last year
- Data Sketches for Apache Spark☆22Updated 2 years ago
- An example of building kubernetes operator (Flink) using Abstract operator's framework☆26Updated 6 years ago
- A Spark plugin for CPU and memory profiling☆18Updated last year
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Updated 8 months ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated 2 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29Updated 5 years ago
- Spark stream from kafka(json) to s3(parquet)☆15Updated 6 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆61Updated last year
- Extensible streaming ingestion pipeline on top of Apache Spark☆45Updated this week
- functionstest☆33Updated 8 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 10 months ago
- Cascading on Apache Flink®☆54Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- ## Auto-archived due to inactivity. ## Simple JVM Profiler Using StatsD and Other Metrics Backends☆15Updated last year
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- something to help you spark☆65Updated 6 years ago
- A Giter8 template for scio☆31Updated 5 months ago
- Google Spreadsheets datasource for SparkSQL and DataFrames☆57Updated last year
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- Spark job for compacting avro files together☆12Updated 7 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A Spark metrics sink that pushes to InfluxDb☆51Updated 4 years ago