LucaCanali / MiscellaneousLinks
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing, measuring CPUs' performance, and I/O latency heat maps. Jupyter notebooks examples for using various DB systems.
☆457Updated last month
Alternatives and similar repositories for Miscellaneous
Users that are interested in Miscellaneous are comparing it to the libraries listed below
Sorting:
- Use the TPC-DS benchmark to test Spark SQL performance☆184Updated 5 years ago
- Benchmark Suite for Apache Spark☆241Updated 2 years ago
- The Internals of Spark SQL☆484Updated last week
- Spark Terasort☆121Updated 2 years ago
- All the things about TPC-DS in Apache Spark☆109Updated 2 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆808Updated 3 weeks ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Updated last year
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆258Updated 6 years ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆258Updated 2 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆336Updated 2 years ago
- TPC-H queries in Apache Spark SQL using native DataFrames API☆98Updated 2 years ago
- Qubole Sparklens tool for performance tuning Apache Spark☆587Updated last year
- The Internals of Delta Lake☆187Updated 2 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆132Updated last month
- ☆314Updated 7 years ago
- Spark Shuffle Optimization with RDMA+AEP☆30Updated 2 years ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94Updated 8 months ago
- TPC-DS Kit for Impala☆170Updated last year
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated 2 years ago
- Performance Analysis Tool☆78Updated 2 months ago
- TPC-DS benchmark kit with some modifications/fixes☆356Updated last year
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆432Updated 4 years ago
- ☆393Updated 2 years ago
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆50Updated 4 months ago
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆958Updated last week
- A tool to get better debug info on spark's memory usage☆42Updated 6 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆96Updated 4 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆285Updated 2 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆232Updated 2 weeks ago
- A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer☆52Updated 2 years ago