oxhead / scoutLinks
Large-scale performance data of Hadoop and Spark on AWS
☆19Updated 7 years ago
Alternatives and similar repositories for scout
Users that are interested in scout are comparing it to the libraries listed below
Sorting:
- Elastic ephemeral storage☆122Updated 3 years ago
- Fast, predictable data analytics based on (and API-compatible with) Apache Spark☆25Updated 8 years ago
- Statistical Workload Injector for MapReduce - Project at UC Berkeley AMP Lab☆129Updated 11 years ago
- Automatically exported from code.google.com/p/cluster-scheduler-simulator☆171Updated 3 years ago
- TPC-H queries in Apache Spark SQL using native DataFrames API☆98Updated last year
- ☆197Updated 6 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Updated last year
- Code for Ernest☆34Updated 2 years ago
- A stateful serverless platform☆245Updated 3 years ago
- FaaSFlow: Enable Efficient Workflow Execution for Function-as-a-Service☆81Updated last year
- Mirror of Apache crail (Incubating)☆150Updated 3 years ago
- Performance Analysis Tool☆78Updated last month
- A benchmark suite for serverless computing☆232Updated 10 months ago
- Use the TPC-DS benchmark to test Spark SQL performance☆183Updated 5 years ago
- Benchmark Suite for Apache Spark☆241Updated 2 years ago
- FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute (USENIX ATC'21)☆55Updated 4 years ago
- Tiresias is a GPU cluster manager for distributed deep learning training.☆164Updated 5 years ago
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆257Updated 6 years ago
- Scripts and code for PARTIES (ASPLOS'19)☆30Updated 4 years ago
- ☆45Updated 3 years ago
- DS2 is an auto-scaling controller for distributed streaming dataflows☆91Updated 2 years ago
- ☆53Updated 2 years ago
- Wukong: A scalable and locality-enhanced serverless parallel framework (ACM SoCC'20)☆76Updated last year
- Huawei Cloud datasets☆82Updated last month
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆357Updated 2 weeks ago
- Snowflake dataset containing statistics for 70 million queries over 14 day period☆115Updated 4 years ago
- The trace player is used to replay anonimized traces for a registry, but can also be used with plugins to simulate caching or prefetching…☆18Updated 5 years ago
- A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer☆52Updated 2 years ago
- This repository contains experimental tools we developed to forecast a clusters' resource (CPU or memory) usage.☆44Updated 4 years ago
- Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020☆136Updated last year