LucaCanali / MiscellaneousLinks
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
☆455Updated 3 months ago
Alternatives and similar repositories for Miscellaneous
Users that are interested in Miscellaneous are comparing it to the libraries listed below
Sorting:
- Use the TPC-DS benchmark to test Spark SQL performance☆180Updated 5 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆785Updated last week
- Benchmark Suite for Apache Spark☆242Updated 2 years ago
- Spark Terasort☆121Updated 2 years ago
- The Internals of Spark SQL☆474Updated this week
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆128Updated 8 months ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Updated 2 years ago
- All the things about TPC-DS in Apache Spark☆107Updated 2 years ago
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆254Updated 6 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆127Updated 2 weeks ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆335Updated last year
- TPC-DS benchmark kit with some modifications/fixes☆342Updated last year
- TPC-H queries in Apache Spark SQL using native DataFrames API☆98Updated last year
- TPC-DS Kit for Impala☆171Updated last year
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆91Updated 4 months ago
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆929Updated last week
- The Internals of Delta Lake☆186Updated 8 months ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆427Updated 3 years ago
- ☆311Updated 6 years ago
- Qubole Sparklens tool for performance tuning Apache Spark☆583Updated last year
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆283Updated last month
- Performance Analysis Tool☆77Updated 3 months ago
- ☆390Updated last year
- Star Schema Benchmark Tool for Apache Kylin☆97Updated 4 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆229Updated last month
- Resource for the book Trino: The Definitive Guide (and formerly Presto: The Definitive Guide)☆229Updated 2 years ago
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆81Updated last week
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 4 years ago
- Gluten: Plugin to Boost Trino's Performance☆75Updated last year