A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.
☆52Sep 17, 2025Updated 5 months ago
Alternatives and similar repositories for spark-s3-shuffle
Users that are interested in spark-s3-shuffle are comparing it to the libraries listed below
Sorting:
- A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service☆13Jul 23, 2025Updated 7 months ago
- 使用shell脚本部署Apache Doris (incubating) FE & BE☆11Jul 8, 2019Updated 6 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- Use Claude Code with your Copilot subscription☆19May 17, 2025Updated 9 months ago
- Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote pe…☆14Sep 18, 2023Updated 2 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆335Sep 29, 2023Updated 2 years ago
- benchmark-for-spark☆18May 7, 2025Updated 9 months ago
- Rust based high-performance Apache Uniffle shuffle-server☆62Updated this week
- This project provides a reverse proxy for Spark UI on Kubernetes☆17Oct 12, 2023Updated 2 years ago
- JVM integration for Weld☆16Sep 24, 2018Updated 7 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆92Mar 5, 2024Updated 2 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Dec 19, 2024Updated last year
- Tutorial on how to setup Trino and Apache Ranger using docker☆41Feb 23, 2026Updated last week
- ☆18Feb 5, 2026Updated last month
- A package to run DuckDB queries from Apache Airflow.☆21Jun 17, 2024Updated last year
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆445Feb 28, 2026Updated last week
- Open Source Chrome Extension that detects clue of Perplexity answer at the cited webpage.☆24Feb 10, 2025Updated last year
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Feb 24, 2026Updated last week
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,039Feb 28, 2026Updated last week
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 3 years ago
- MLlib Convolutional and Feedforward Neural Network implementation with a high level API and advanced optimizers.☆27Aug 30, 2017Updated 8 years ago
- Local AWS EMR - A local service that imitates AWS EMR☆27Jul 5, 2023Updated 2 years ago
- Python package for querying iceberg data through duckdb.☆74Feb 12, 2024Updated 2 years ago
- High Performance Network Library for RDMA☆28Jan 3, 2023Updated 3 years ago
- A tool for translating Scala source code into readable and maintainable Java code☆13Jan 3, 2026Updated 2 months ago
- Storage Benchmark Kit☆33Nov 5, 2025Updated 4 months ago
- A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…☆183Apr 6, 2022Updated 3 years ago
- Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…☆257Apr 7, 2023Updated 2 years ago
- ☆10Jun 29, 2021Updated 4 years ago
- SwiftLake: Java SQL engine built on Apache Iceberg and DuckDB for efficient lakehouse reads and writes☆30Aug 13, 2025Updated 6 months ago
- ☆43Feb 20, 2016Updated 10 years ago
- Script to prevent Macbook from sleeping when lid is closed☆31Nov 5, 2023Updated 2 years ago
- Spline agent for Apache Spark☆202Updated this week
- Dione - a Spark and HDFS indexing library☆52Oct 27, 2025Updated 4 months ago
- Export Airflow metrics (from mysql) in prometheus format☆29Apr 15, 2025Updated 10 months ago
- Spark Terasort☆121Apr 21, 2023Updated 2 years ago
- Spark ClickHouse Connector build on DataSourceV2 API☆213Feb 20, 2026Updated 2 weeks ago
- This is a library for SQL optimizing/rewriting including Materialized View rewrite☆69Jun 21, 2022Updated 3 years ago