A re-implementation of Hadoop DistCP in Apache Spark
☆47Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for spark-distcp
Users that are interested in spark-distcp are comparing it to the libraries listed below
Sorting:
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆445Feb 28, 2026Updated last week
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Apr 15, 2025Updated 10 months ago
- Java event logs collector for hadoop and frameworks☆41Mar 25, 2025Updated 11 months ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Jan 3, 2023Updated 3 years ago
- Client libraries of end users of Apache Kyuubi☆11Jan 10, 2023Updated 3 years ago
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆52Sep 17, 2025Updated 5 months ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Feb 24, 2026Updated last week
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.