A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.
☆52Sep 17, 2025Updated 6 months ago
Alternatives and similar repositories for spark-s3-shuffle
Users that are interested in spark-s3-shuffle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service☆13Mar 17, 2026Updated last week
- benchmark-for-spark☆18May 7, 2025Updated 10 months ago
- This project provides a reverse proxy for Spark UI on Kubernetes☆17Oct 12, 2023Updated 2 years ago
- Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote pe…☆14Sep 18, 2023Updated 2 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆335Sep 29, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- JVM integration for Weld☆16Sep 24, 2018Updated 7 years ago
- Use Claude Code with your Copilot subscription☆19May 17, 2025Updated 10 months ago
- A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).☆18Apr 20, 2024Updated last year
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- 基于多线程与epoll的高并发TCP服务器☆11Aug 4, 2018Updated 7 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆131Dec 19, 2024Updated last year
- A Python library and command line utility for manipulating and plotting stellar lightcurves.☆10Jun 14, 2016Updated 9 years ago
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,039Mar 19, 2026Updated last week
- 使用shell脚本部署Apache Doris (incubating) FE & BE☆11Jul 8, 2019Updated 6 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆446Mar 19, 2026Updated last week
- Open Source Chrome Extension that detects clue of Perplexity answer at the cited webpage.☆24Feb 10, 2025Updated last year
- Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…☆257Apr 7, 2023Updated 2 years ago
- Elasticsearch REPL built on top of Jest☆23May 12, 2015Updated 10 years ago
- IceDB S3 Proxy to trick S3 clients into only seeing alive files☆13Dec 24, 2023Updated 2 years ago
- The gateway component to make Spark on K8s much easier for Spark users.☆216Dec 16, 2025Updated 3 months ago
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 3 years ago
- Tutorial on how to setup Trino and Apache Ranger using docker☆41Feb 23, 2026Updated last month
- An example application for the Ash Framework☆12Jan 3, 2023Updated 3 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- HDFS based on Java implementation as a remote ObjectStore for DataFusion☆10Feb 13, 2024Updated 2 years ago
- Apache DataFusion Comet Spark Accelerator☆1,154Mar 19, 2026Updated last week
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆93Mar 5, 2024Updated 2 years ago
- Spline agent for Apache Spark☆202Mar 17, 2026Updated last week
- My custom IntelliJ colour scheme, nicknamed "happy hakking"☆11Jul 27, 2016Updated 9 years ago
- ☆16Jul 25, 2025Updated 8 months ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆285Feb 24, 2026Updated last month
- Demo for service oriented application hosted on Hadoop YARN cluster for HA and scheduling☆23Apr 2, 2018Updated 7 years ago
- Python package for querying iceberg data through duckdb.☆74Feb 12, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Dione - a Spark and HDFS indexing library☆52Oct 27, 2025Updated 5 months ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Jun 15, 2023Updated 2 years ago
- SwiftLake: Java SQL engine built on Apache Iceberg and DuckDB for efficient lakehouse reads and writes☆30Aug 13, 2025Updated 7 months ago
- ☆18Sep 15, 2018Updated 7 years ago
- A package to run DuckDB queries from Apache Airflow.☆21Jun 17, 2024Updated last year
- SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.☆137Mar 6, 2023Updated 3 years ago
- An Extensible Data Skipping Framework☆48Jul 15, 2025Updated 8 months ago