Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
☆21Mar 15, 2024Updated 2 years ago
Alternatives and similar repositories for remote-shuffle
Users that are interested in remote-shuffle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote pe…☆14Sep 18, 2023Updated 2 years ago
- Hadoop InputFormat for http://druid.io/☆10Oct 26, 2016Updated 9 years ago
- Ted is a line oriented text editor and formatter☆12Jun 29, 2020Updated 5 years ago
- An ambient sound generator using free sounds from BBC Sounds Effects☆14Dec 3, 2023Updated 2 years ago
- Html Content / Article Extractor in Scala☆18May 23, 2018Updated 7 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…☆257Apr 7, 2023Updated 3 years ago
- Spark Shuffle Optimization with RDMA+AEP☆30May 23, 2023Updated 2 years ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Jan 3, 2023Updated 3 years ago
- ☆12Apr 7, 2025Updated last year
- ☆18Nov 4, 2024Updated last year
- SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.☆136Mar 6, 2023Updated 3 years ago
- ☆10Oct 12, 2022Updated 3 years ago
- 项目中保留了向开源社区提交过的patch☆16Oct 22, 2017Updated 8 years ago
- A Python interface to gb-io, a fast GenBank parser written in Rust.☆24Apr 23, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Apache Spark - A unified analytics engine for large-scale data processing☆16Jul 24, 2023Updated 2 years ago
- Mirror of Apache Hadoop common☆108Jul 8, 2020Updated 5 years ago
- HMM-guided metagenomic gene-targeted assembler using iterative de Bruijn graphs☆18Oct 3, 2016Updated 9 years ago
- Mirror of Apache livy (Incubating)☆13Feb 11, 2026Updated 2 months ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆256Feb 21, 2023Updated 3 years ago
- A curated list of awesome Dropbox SDKs, open source libraries, and cool tools and services powered by Dropbox.☆15Apr 6, 2016Updated 10 years ago
- Cache File System optimized for columnar formats and object stores☆188Aug 11, 2022Updated 3 years ago
- Plugin to accelerate Spark SQL with the NEC Vector Engine.☆19Aug 15, 2022Updated 3 years ago
- Supporting code for Learning to Rank (LTR) presentation☆16Oct 11, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A list of awesome beginners-friendly projects.☆12Oct 5, 2020Updated 5 years ago
- Memtier benchmark front-end☆10May 9, 2023Updated 2 years ago
- Node.js kafka connect connector for prometheus☆13Dec 7, 2022Updated 3 years ago
- A library enabling DAG structuring of data processing programs such as ETLs☆17Apr 13, 2026Updated 3 weeks ago
- Performance Analysis Tool☆77Nov 25, 2025Updated 5 months ago
- DBT CLI MCP Server☆19Jun 26, 2025Updated 10 months ago
- Log driver plugin for docker explained. The boilerplate code here can also be used to write your own driver if you are feeling adventurou…☆13Mar 13, 2019Updated 7 years ago
- This project provides fully automated one-click experience to create Cloud and Kubernetes environment to run Data Analytics workload like…☆55Jan 2, 2023Updated 3 years ago
- MIRROR OF: The European Molecular Biology Open Software Suite (from git://anonscm.debian.org/debian-med/emboss.git)☆32Feb 18, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.☆24Jul 7, 2016Updated 9 years ago
- Easy way to send Finagle metrics to Codahale Metrics library☆42Apr 2, 2020Updated 6 years ago
- Scalable NameNode RPC Proxy for HDFS Federation☆88Apr 19, 2016Updated 10 years ago
- Golang library for using persistent memory☆29Oct 7, 2022Updated 3 years ago
- Redis Cluster Ansible role.☆14Dec 4, 2019Updated 6 years ago
- Mirror of Apache Ranger☆15Apr 5, 2024Updated 2 years ago
- The presentation at Spark Summit 2014 showing how 4Quant does production scale image processing and analysis using Spark☆16Jul 29, 2014Updated 11 years ago