IBM / spark-s3-shuffleView external linksLinks
A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.
☆52Sep 17, 2025Updated 4 months ago
Alternatives and similar repositories for spark-s3-shuffle
Users that are interested in spark-s3-shuffle are comparing it to the libraries listed below
Sorting:
- A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service☆13Jul 23, 2025Updated 6 months ago
- 使用shell脚本部署Apache Doris (incubating) FE & BE☆11Jul 8, 2019Updated 6 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- GEDS is a distributed ephemeral data store that enables flexible scaling of compute and storage. It uses a centralized name-node to store…☆16Sep 17, 2025Updated 4 months ago
- Use Claude Code with your Copilot subscription☆18May 17, 2025Updated 8 months ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆336Sep 29, 2023Updated 2 years ago
- benchmark-for-spark☆18May 7, 2025Updated 9 months ago
- Rust based high-performance Apache Uniffle shuffle-server☆60Feb 4, 2026Updated last week
- JVM integration for Weld☆16Sep 24, 2018Updated 7 years ago
- A package to run DuckDB queries from Apache Airflow.☆20Jun 17, 2024Updated last year
- This project provides a reverse proxy for Spark UI on Kubernetes☆17Oct 12, 2023Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆91Mar 5, 2024Updated last year
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Dec 19, 2024Updated last year
- Tutorial on how to setup Trino and Apache Ranger using docker☆41Jul 21, 2024Updated last year
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆443Jan 30, 2026Updated 2 weeks ago
- Open Source Chrome Extension that detects clue of Perplexity answer at the cited webpage.☆24Feb 10, 2025Updated last year
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆285Nov 26, 2025Updated 2 months ago
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,037Feb 5, 2026Updated last week
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 3 years ago
- Local AWS EMR - A local service that imitates AWS EMR☆27Jul 5, 2023Updated 2 years ago
- Demo for service oriented application hosted on Hadoop YARN cluster for HA and scheduling☆23Apr 2, 2018Updated 7 years ago
- Storage Benchmark Kit☆33Nov 5, 2025Updated 3 months ago
- A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…☆182Apr 6, 2022Updated 3 years ago
- Advanced block device testing/file system testing, targetting SNIA compatible reporting☆12Oct 15, 2025Updated 3 months ago
- SwiftLake: Java SQL engine built on Apache Iceberg and DuckDB for efficient lakehouse reads and writes☆30Aug 13, 2025Updated 6 months ago
- Spline agent for Apache Spark☆201Jan 21, 2026Updated 3 weeks ago
- Export Airflow metrics (from mysql) in prometheus format☆29Apr 15, 2025Updated 9 months ago
- Dione - a Spark and HDFS indexing library☆52Oct 27, 2025Updated 3 months ago
- Spark ClickHouse Connector build on DataSourceV2 API☆211Feb 1, 2026Updated last week
- This is a library for SQL optimizing/rewriting including Materialized View rewrite☆69Jun 21, 2022Updated 3 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆72Jan 1, 2023Updated 3 years ago
- Remote Shuffle Service for Flink☆191Jan 6, 2023Updated 3 years ago
- Apache DataFusion Comet Spark Accelerator☆1,130Feb 7, 2026Updated last week
- A batch-processing system base on Spring Boot and Spring Batch. 一个基于SpringBoot和SpringBatch的批处理系统。☆11Sep 10, 2018Updated 7 years ago
- Lustre Repository with MS patches☆13Feb 6, 2026Updated last week
- SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.☆136Mar 6, 2023Updated 2 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆39Feb 17, 2025Updated 11 months ago