A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.
☆52Sep 17, 2025Updated 8 months ago
Alternatives and similar repositories for spark-s3-shuffle
Users that are interested in spark-s3-shuffle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service☆13Mar 26, 2026Updated 2 months ago
- This project provides a reverse proxy for Spark UI on Kubernetes☆16Oct 12, 2023Updated 2 years ago
- benchmark-for-spark☆18May 7, 2025Updated last year
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆335Sep 29, 2023Updated 2 years ago
- JVM integration for Weld☆16Sep 24, 2018Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- How to create and record demos in terminal sessions☆11May 3, 2024Updated 2 years ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆131Dec 19, 2024Updated last year
- Rust based high-performance Apache Uniffle shuffle-server☆65Apr 24, 2026Updated last month
- A Python library and command line utility for manipulating and plotting stellar lightcurves.☆10Jun 14, 2016Updated 9 years ago
- ☆18Apr 6, 2026Updated last month
- ☆10Jun 29, 2021Updated 4 years ago
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,047Updated this week
- Spark Terasort☆122Apr 21, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 使用shell脚本部署Apache Doris (incubating) FE & BE☆11Jul 8, 2019Updated 6 years ago
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆451May 17, 2026Updated last week
- Parquet file generator☆22Apr 17, 2018Updated 8 years ago
- Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…☆257Apr 7, 2023Updated 3 years ago
- The gateway component to make Spark on K8s much easier for Spark users.☆217May 6, 2026Updated 2 weeks ago
- Tutorial on how to setup Trino and Apache Ranger using docker☆41Feb 23, 2026Updated 3 months ago
- HDFS based on Java implementation as a remote ObjectStore for DataFusion☆10Feb 13, 2024Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆93Mar 5, 2024Updated 2 years ago
- Spline agent for Apache Spark☆202Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆286Feb 24, 2026Updated 3 months ago
- Python package for querying iceberg data through duckdb.☆75Feb 12, 2024Updated 2 years ago
- SwiftLake: Java SQL engine built on Apache Iceberg and DuckDB for efficient lakehouse reads and writes☆32Aug 13, 2025Updated 9 months ago
- ☆18Sep 15, 2018Updated 7 years ago
- A package to run DuckDB queries from Apache Airflow.☆21Jun 17, 2024Updated last year
- SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.☆136Mar 6, 2023Updated 3 years ago
- HDFS Native Client in Rust via HDFS C API libhdfs☆41Jan 27, 2025Updated last year
- An Extensible Data Skipping Framework☆48Jul 15, 2025Updated 10 months ago
- Testing Sandbox for Hadoop Ecosystem Components☆45Apr 29, 2026Updated 3 weeks ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Client libraries of end users of Apache Kyuubi☆11May 15, 2026Updated last week
- Python Repository of the Institute of Astronomy @ KU Leuven☆20Nov 5, 2020Updated 5 years ago
- The ultimate Vim configuration: .vimrc (heavily customized, uncompromising and opinionated)☆11Mar 2, 2024Updated 2 years ago
- ☆49Feb 14, 2022Updated 4 years ago
- Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.☆169Nov 30, 2023Updated 2 years ago
- ☆23Feb 7, 2024Updated 2 years ago
- My custom Helm Chart repository☆18Dec 20, 2025Updated 5 months ago